May « 2025 « Betwise Blog

Archive for May, 2025

Calculating price movements

Thursday, May 15th, 2025

In my previous blog post, I explained how meaningful odds can be extracted from the betting_text column in the historic_runners tables. We will now discuss how these odds can be converted into a decimal price and compared again with the starting price to produce a price movement for each runner in a race.

The SQL script below builds a table called all_prev_race_price_movements. It holds information about each horse’s latest race for a specific type of race (such as turf, all-weather, etc.) and, where possible, details from the horse’s previous race of the same type.

The script then works out how the betting odds changed between the two races — basically, tracking if a horse was backed or drifted in the market.

The script returns not just the price movement but other pieces of useful data for the current and previous runs. Eg. finish_position, position_in_betting, unfinished, distance_won, distance_beaten, etc. (This data will be useful in the next blog post, where we will show how the table created can be use to analyse the data further).

Takeaways from the full SQL script below:

This setup allows you to:

See how a horse’s market position changed from one race to the next.
Spot if a horse was heavily backed (shortened in price) or was not fancied (ie. the price drifted).
Compare races of the same type only, so the analysis is consistent.

Step-by-Step Breakdown

1. RankedRaces

This section links together races and runners.
It gives each race a ranking within its race type, with the most recent races ranked highest.

2. LastRaceTimes

This picks out the most recent race for each race type.
It helps make sure we only look at races up to the latest one — anything in the future gets ignored.

3. FilteredRaces

This is the main bit of the logic.

It filters out races that happened after the most recent one.
It ranks each runner’s races within that race type — so we can tell which was most recent run, which was next most recent run, and so on.
It fetches details like the starting price, where the horse finished, etc.
It also attempts to extract the opening odds from the betting summary text ( as covered in my previous blog post).
It calculates a percentage chance (ie. probability) implied by the starting price.

4. ProcessedRaces

Adds a calculation to convert the opening odds into a percentage.
This allows us to compare the market expectation before the race with the actual SP (starting price).

Final Output (SELECT Statement)

The final section:

Focuses on the most recent race for each horse and race type.
Tries to link it to the previous race of the same type for the same horse.
If no such race exists, the previous race columns are just left empty (NULL).
Works out the percentage change from the opening price to the starting price — this shows whether the horse was backed in (price dropped) or drifted (price lengthened).
The whole code below has the following criteria to return results for horses in races in the last 3 months, this can be changed as desired.

AND curr.scheduled_time >= DATE_SUB(CURRENT_DATE, INTERVAL 3 MONTH)

The results of this SQL can been seen by running a simple select:

SELECT * FROM all_prev_race_price_movements;

Full SQL Script:


DROP TABLE IF EXISTS all_prev_race_price_movements;
CREATE TABLE all_prev_race_price_movements AS

WITH RankedRaces AS (
    SELECT 
        hr.scheduled_time, 
        hru.runner_id, 
        hr.race_id, 
        hr.race_type_id,
        hr.handicap,
        hru.starting_price,
        hru.starting_price_decimal,
        hru.position_in_betting,
        hru.finish_position,
        hru.unfinished,
        hru.distance_won,
        hru.distance_beaten,
        hru.official_rating,
        hru.betting_text,
        RANK() OVER (PARTITION BY hr.race_type_id ORDER BY hr.scheduled_time DESC) AS race_type_rank
    FROM historic_races hr
    JOIN historic_runners hru USING (race_id)
), LastRaceTimes AS (
    SELECT 
        race_type_id, 
        MAX(scheduled_time) AS last_race_time
    FROM RankedRaces
    WHERE race_type_rank = 1
    GROUP BY race_type_id
), FilteredRaces AS (
    SELECT 
        rr.scheduled_time, 
        rr.runner_id, 
        rr.race_id, 
        rr.race_type_id,
        rr.handicap,  
        rr.official_rating,
        rr.starting_price AS previous_starting_price,
        rr.starting_price_decimal AS previous_starting_price_decimal,
        rr.position_in_betting AS previous_position_in_betting,
        rr.finish_position AS previous_finish_position,
        rr.unfinished AS previous_unfinished,
        rr.distance_beaten AS previous_distance_won,
        rr.distance_beaten AS previous_distance_beaten,
        rr.official_rating AS previous_official_rating,
        rr.handicap AS previous_handicap,
        rr.betting_text AS previous_betting_text,

        -- Parse odds string
        IFNULL(
            REPLACE(
                REGEXP_SUBSTR(
                    rr.betting_text,
                    '(?<=op )\\d{1,3}/\\d{1,3}|(?<=tchd )\\d{1,3}/\\d{1,3}|(?<=op )Evens|(?<=tchd )Evens'
                ), 
                'Evens', 
                '1/1'
            ), 
            '0/0'
        ) AS derived_previous_odds,

        -- Cast to prevent divide by zero, using DECIMAL to avoid FLOAT deprecation
        CASE 
            WHEN rr.starting_price_decimal > 0 THEN 
                CAST(CAST(100 AS DECIMAL(20,10)) / CAST(rr.starting_price_decimal AS DECIMAL(20,10)) AS DECIMAL(20,10))
            ELSE NULL 
        END AS previous_SP_pct,

        ROW_NUMBER() OVER (PARTITION BY rr.runner_id, rr.race_type_id ORDER BY rr.scheduled_time DESC) AS race_rank
    FROM RankedRaces rr
    JOIN LastRaceTimes lrt ON rr.race_type_id = lrt.race_type_id
    WHERE rr.scheduled_time <= lrt.last_race_time
), ProcessedRaces AS (
    SELECT *,
        -- Convert odds string to decimal price percentage using DECIMAL
        CASE 
            WHEN REGEXP_LIKE(derived_previous_odds, '\\d+/\\d+') AND 
                 CAST(SUBSTRING_INDEX(derived_previous_odds, '/', -1) AS DECIMAL(20,10)) > 0
            THEN CAST(
                CAST(100 AS DECIMAL(20,10)) / (
                    CAST(SUBSTRING_INDEX(derived_previous_odds, '/', 1) AS DECIMAL(20,10)) /
                    CAST(SUBSTRING_INDEX(derived_previous_odds, '/', -1) AS DECIMAL(20,10)) + 
                    CAST(1 AS DECIMAL(20,10))
                )
                AS DECIMAL(20,10)
            )
            ELSE NULL
        END AS previous_opening_price_pct
    FROM FilteredRaces
)
SELECT 
    curr.runner_id, 
    curr.race_type_id, 
    curr.scheduled_time AS current_race_time, 
    curr.race_id AS current_race_id,
    curr.handicap AS current_handicap, 
    curr.official_rating AS current_official_rating,
    prev.scheduled_time AS previous_race_time, 
    prev.race_id AS previous_race_id,
    prev.previous_starting_price,
    prev.previous_starting_price_decimal,
    prev.previous_position_in_betting,
    prev.previous_finish_position,
    prev.previous_unfinished,
    prev.previous_distance_won,
    prev.previous_distance_beaten,
    prev.previous_official_rating,
    prev.previous_handicap,
    prev.previous_betting_text,
    prev.derived_previous_odds,  
    prev.previous_SP_pct,
    prev.previous_opening_price_pct,


-- Safe percentage movement calc using DECIMAL only
CASE 
    WHEN prev.previous_opening_price_pct > 0 THEN 
        CAST((
            (CAST(prev.previous_SP_pct AS DECIMAL(20,10)) - CAST(prev.previous_opening_price_pct AS DECIMAL(20,10))) 
            / CAST(prev.previous_opening_price_pct AS DECIMAL(20,10))
        ) * CAST(100 AS DECIMAL(20,10)) AS DECIMAL(20,10))
    ELSE NULL 
END AS previous_price_movement
FROM ProcessedRaces curr
LEFT JOIN ProcessedRaces prev 
    ON curr.runner_id = prev.runner_id 
    AND curr.race_type_id = prev.race_type_id 
    AND prev.race_rank = curr.race_rank + 1
WHERE curr.race_rank = 1
AND curr.scheduled_time >= DATE_SUB(CURRENT_DATE, INTERVAL 3 MONTH)
ORDER BY curr.race_type_id, curr.runner_id;

| No Comments »

Extracting useful data from the betting string

Sunday, May 11th, 2025

This post comes as a result of thinking about ways to calculate the price movements of horses, from the opening show to the SP. The opening show and other price movements aren’t currently available in Smartform as separate fields, however the opening show and other price movements are contained within the betting_text column in the historic_runners and historic_runners_beta tables, stored as a text string.

The question is how to split out the data. After a number of experiments, I came up with the following SQL to derive a usable price to go to work with. In my next post I will describe how to convert this to a decimal price and calculate the price movement from the opening show to starting price.

IFNULL(
     REPLACE(
         REGEXP_SUBSTR(
             betting_text,
             '(?<=op )\d{1,3}/\d{1,3}|(?<=tchd )\d{1,3}/\d{1,3}|(?<=op )Evens|(?<=tchd )Evens'
         ), 
         'Evens', 
         '1/1'
     ), 
     '0/0'
 ) AS derived_previous_odds

The above code looks into the string field called betting_text and tries to find the previous odds of a horse from the point at which the betting shows open on course before any given race

Here’s what each part does:

REGEXP_SUBSTR(...):
This searches the text for the first match of one of the following patterns:
- A fraction like 5/2, 11/8, etc. that appears after the word “op “ (short for “opened at”)
- A fraction that appears after the word “tchd “ (short for “touched”)
- The word "Evens" (which means 1/1 odds) after either “op ” or “tchd “
REPLACE(..., 'Evens', '1/1'):
If the match found was "Evens", this replaces it with "1/1" to standardize it into fractional format.
IFNULL(..., '0/0'):
If nothing matched (i.e., there were no odds found in the text), it returns '0/0' instead of null.

Examples

betting_text = "op 5/1 tchd 6/1"

Match found: "5/1" (after "op ")
Replacement: None
Final result: "5/1"

betting_text: "op Evens tchd 11/10"

Match found: "Evens" (after "op ")
Replacement: "Evens" → "1/1"
Final result: "1/1"

betting_text = "op 85/40 tchd 2/1"

Match found: "85/40" (after "op ")
Replacement: None
Final result: "85/40"

betting_text = "tchd 30/100 op 4/5"

Match found: "30/100" (after "tchd ")
Replacement: None
Final result: "30/100"

betting_text = "non-runner, withdrawn at start"

Match found: None (no “op” or “tchd” with odds or “Evens”)
Replacement: Not applicable
Final result: "0/0" (default value when nothing matches)