Calculating price movements
By Nick Franks on Thursday, May 15th, 2025In my previous blog post, I explained how meaningful odds can be extracted from the betting_text column in the historic_runners tables. We will now discuss how these odds can be converted into a decimal price and compared again with the starting price to produce a price movement for each runner in a race.
The SQL script below builds a table called all_prev_race_price_movements. It holds information about each horse’s latest race for a specific type of race (such as turf, all-weather, etc.) and, where possible, details from the horse’s previous race of the same type.
The script then works out how the betting odds changed between the two races — basically, tracking if a horse was backed or drifted in the market.
The script returns not just the price movement but other pieces of useful data for the current and previous runs. Eg. finish_position, position_in_betting, unfinished, distance_won, distance_beaten, etc. (This data will be useful in the next blog post, where we will show how the table created can be use to analyse the data further).
Takeaways from the full SQL script below:
This setup allows you to:
- See how a horse’s market position changed from one race to the next.
- Spot if a horse was heavily backed (shortened in price) or was not fancied (ie. the price drifted).
- Compare races of the same type only, so the analysis is consistent.
Step-by-Step Breakdown
1. RankedRaces
- This section links together races and runners.
- It gives each race a ranking within its race type, with the most recent races ranked highest.
2. LastRaceTimes
- This picks out the most recent race for each race type.
- It helps make sure we only look at races up to the latest one — anything in the future gets ignored.
3. FilteredRaces
This is the main bit of the logic.
- It filters out races that happened after the most recent one.
- It ranks each runner’s races within that race type — so we can tell which was most recent run, which was next most recent run, and so on.
- It fetches details like the starting price, where the horse finished, etc.
- It also attempts to extract the opening odds from the betting summary text ( as covered in my previous blog post).
- It calculates a percentage chance (ie. probability) implied by the starting price.
4. ProcessedRaces
- Adds a calculation to convert the opening odds into a percentage.
- This allows us to compare the market expectation before the race with the actual SP (starting price).
Final Output (SELECT Statement)
The final section:
- Focuses on the most recent race for each horse and race type.
- Tries to link it to the previous race of the same type for the same horse.
- If no such race exists, the previous race columns are just left empty (NULL).
- Works out the percentage change from the opening price to the starting price — this shows whether the horse was backed in (price dropped) or drifted (price lengthened).
- The whole code below has the following criteria to return results for horses in races in the last 3 months, this can be changed as desired.
AND curr.scheduled_time >= DATE_SUB(CURRENT_DATE, INTERVAL 3 MONTH)
The results of this SQL can been seen by running a simple select:
SELECT * FROM all_prev_race_price_movements;
Full SQL Script:
DROP TABLE IF EXISTS all_prev_race_price_movements;
CREATE TABLE all_prev_race_price_movements AS
WITH RankedRaces AS (
SELECT
hr.scheduled_time,
hru.runner_id,
hr.race_id,
hr.race_type_id,
hr.handicap,
hru.starting_price,
hru.starting_price_decimal,
hru.position_in_betting,
hru.finish_position,
hru.unfinished,
hru.distance_won,
hru.distance_beaten,
hru.official_rating,
hru.betting_text,
RANK() OVER (PARTITION BY hr.race_type_id ORDER BY hr.scheduled_time DESC) AS race_type_rank
FROM historic_races hr
JOIN historic_runners hru USING (race_id)
), LastRaceTimes AS (
SELECT
race_type_id,
MAX(scheduled_time) AS last_race_time
FROM RankedRaces
WHERE race_type_rank = 1
GROUP BY race_type_id
), FilteredRaces AS (
SELECT
rr.scheduled_time,
rr.runner_id,
rr.race_id,
rr.race_type_id,
rr.handicap,
rr.official_rating,
rr.starting_price AS previous_starting_price,
rr.starting_price_decimal AS previous_starting_price_decimal,
rr.position_in_betting AS previous_position_in_betting,
rr.finish_position AS previous_finish_position,
rr.unfinished AS previous_unfinished,
rr.distance_beaten AS previous_distance_won,
rr.distance_beaten AS previous_distance_beaten,
rr.official_rating AS previous_official_rating,
rr.handicap AS previous_handicap,
rr.betting_text AS previous_betting_text,
-- Parse odds string
IFNULL(
REPLACE(
REGEXP_SUBSTR(
rr.betting_text,
'(?<=op )\\d{1,3}/\\d{1,3}|(?<=tchd )\\d{1,3}/\\d{1,3}|(?<=op )Evens|(?<=tchd )Evens'
),
'Evens',
'1/1'
),
'0/0'
) AS derived_previous_odds,
-- Cast to prevent divide by zero, using DECIMAL to avoid FLOAT deprecation
CASE
WHEN rr.starting_price_decimal > 0 THEN
CAST(CAST(100 AS DECIMAL(20,10)) / CAST(rr.starting_price_decimal AS DECIMAL(20,10)) AS DECIMAL(20,10))
ELSE NULL
END AS previous_SP_pct,
ROW_NUMBER() OVER (PARTITION BY rr.runner_id, rr.race_type_id ORDER BY rr.scheduled_time DESC) AS race_rank
FROM RankedRaces rr
JOIN LastRaceTimes lrt ON rr.race_type_id = lrt.race_type_id
WHERE rr.scheduled_time <= lrt.last_race_time
), ProcessedRaces AS (
SELECT *,
-- Convert odds string to decimal price percentage using DECIMAL
CASE
WHEN REGEXP_LIKE(derived_previous_odds, '\\d+/\\d+') AND
CAST(SUBSTRING_INDEX(derived_previous_odds, '/', -1) AS DECIMAL(20,10)) > 0
THEN CAST(
CAST(100 AS DECIMAL(20,10)) / (
CAST(SUBSTRING_INDEX(derived_previous_odds, '/', 1) AS DECIMAL(20,10)) /
CAST(SUBSTRING_INDEX(derived_previous_odds, '/', -1) AS DECIMAL(20,10)) +
CAST(1 AS DECIMAL(20,10))
)
AS DECIMAL(20,10)
)
ELSE NULL
END AS previous_opening_price_pct
FROM FilteredRaces
)
SELECT
curr.runner_id,
curr.race_type_id,
curr.scheduled_time AS current_race_time,
curr.race_id AS current_race_id,
curr.handicap AS current_handicap,
curr.official_rating AS current_official_rating,
prev.scheduled_time AS previous_race_time,
prev.race_id AS previous_race_id,
prev.previous_starting_price,
prev.previous_starting_price_decimal,
prev.previous_position_in_betting,
prev.previous_finish_position,
prev.previous_unfinished,
prev.previous_distance_won,
prev.previous_distance_beaten,
prev.previous_official_rating,
prev.previous_handicap,
prev.previous_betting_text,
prev.derived_previous_odds,
prev.previous_SP_pct,
prev.previous_opening_price_pct,
-- Safe percentage movement calc using DECIMAL only
CASE
WHEN prev.previous_opening_price_pct > 0 THEN
CAST((
(CAST(prev.previous_SP_pct AS DECIMAL(20,10)) - CAST(prev.previous_opening_price_pct AS DECIMAL(20,10)))
/ CAST(prev.previous_opening_price_pct AS DECIMAL(20,10))
) * CAST(100 AS DECIMAL(20,10)) AS DECIMAL(20,10))
ELSE NULL
END AS previous_price_movement
FROM ProcessedRaces curr
LEFT JOIN ProcessedRaces prev
ON curr.runner_id = prev.runner_id
AND curr.race_type_id = prev.race_type_id
AND prev.race_rank = curr.race_rank + 1
WHERE curr.race_rank = 1
AND curr.scheduled_time >= DATE_SUB(CURRENT_DATE, INTERVAL 3 MONTH)
ORDER BY curr.race_type_id, curr.runner_id;