Archive for May, 2025
Calculating price movements
Thursday, May 15th, 2025In my previous blog post, I explained how meaningful odds can be extracted from the betting_text column in the historic_runners tables. We will now discuss how these odds can be converted into a decimal price and compared again with the starting price to produce a price movement for each runner in a race.
The SQL script below builds a table called all_prev_race_price_movements. It holds information about each horse’s latest race for a specific type of race (such as turf, all-weather, etc.) and, where possible, details from the horse’s previous race of the same type.
The script then works out how the betting odds changed between the two races — basically, tracking if a horse was backed or drifted in the market.
The script returns not just the price movement but other pieces of useful data for the current and previous runs. Eg. finish_position, position_in_betting, unfinished, distance_won, distance_beaten, etc. (This data will be useful in the next blog post, where we will show how the table created can be use to analyse the data further).
Takeaways from the full SQL script below:
This setup allows you to:
- See how a horse’s market position changed from one race to the next.
- Spot if a horse was heavily backed (shortened in price) or was not fancied (ie. the price drifted).
- Compare races of the same type only, so the analysis is consistent.
Step-by-Step Breakdown
1. RankedRaces
- This section links together races and runners.
- It gives each race a ranking within its race type, with the most recent races ranked highest.
2. LastRaceTimes
- This picks out the most recent race for each race type.
- It helps make sure we only look at races up to the latest one — anything in the future gets ignored.
3. FilteredRaces
This is the main bit of the logic.
- It filters out races that happened after the most recent one.
- It ranks each runner’s races within that race type — so we can tell which was most recent run, which was next most recent run, and so on.
- It fetches details like the starting price, where the horse finished, etc.
- It also attempts to extract the opening odds from the betting summary text ( as covered in my previous blog post).
- It calculates a percentage chance (ie. probability) implied by the starting price.
4. ProcessedRaces
- Adds a calculation to convert the opening odds into a percentage.
- This allows us to compare the market expectation before the race with the actual SP (starting price).
Final Output (SELECT Statement)
The final section:
- Focuses on the most recent race for each horse and race type.
- Tries to link it to the previous race of the same type for the same horse.
- If no such race exists, the previous race columns are just left empty (NULL).
- Works out the percentage change from the opening price to the starting price — this shows whether the horse was backed in (price dropped) or drifted (price lengthened).
- The whole code below has the following criteria to return results for horses in races in the last 3 months, this can be changed as desired.
AND curr.scheduled_time >= DATE_SUB(CURRENT_DATE, INTERVAL 3 MONTH)
The results of this SQL can been seen by running a simple select:
SELECT * FROM all_prev_race_price_movements;
Full SQL Script:
DROP TABLE IF EXISTS all_prev_race_price_movements;
CREATE TABLE all_prev_race_price_movements AS
WITH RankedRaces AS (
SELECT
hr.scheduled_time,
hru.runner_id,
hr.race_id,
hr.race_type_id,
hr.handicap,
hru.starting_price,
hru.starting_price_decimal,
hru.position_in_betting,
hru.finish_position,
hru.unfinished,
hru.distance_won,
hru.distance_beaten,
hru.official_rating,
hru.betting_text,
RANK() OVER (PARTITION BY hr.race_type_id ORDER BY hr.scheduled_time DESC) AS race_type_rank
FROM historic_races hr
JOIN historic_runners hru USING (race_id)
), LastRaceTimes AS (
SELECT
race_type_id,
MAX(scheduled_time) AS last_race_time
FROM RankedRaces
WHERE race_type_rank = 1
GROUP BY race_type_id
), FilteredRaces AS (
SELECT
rr.scheduled_time,
rr.runner_id,
rr.race_id,
rr.race_type_id,
rr.handicap,
rr.official_rating,
rr.starting_price AS previous_starting_price,
rr.starting_price_decimal AS previous_starting_price_decimal,
rr.position_in_betting AS previous_position_in_betting,
rr.finish_position AS previous_finish_position,
rr.unfinished AS previous_unfinished,
rr.distance_beaten AS previous_distance_won,
rr.distance_beaten AS previous_distance_beaten,
rr.official_rating AS previous_official_rating,
rr.handicap AS previous_handicap,
rr.betting_text AS previous_betting_text,
-- Parse odds string
IFNULL(
REPLACE(
REGEXP_SUBSTR(
rr.betting_text,
'(?<=op )\\d{1,3}/\\d{1,3}|(?<=tchd )\\d{1,3}/\\d{1,3}|(?<=op )Evens|(?<=tchd )Evens'
),
'Evens',
'1/1'
),
'0/0'
) AS derived_previous_odds,
-- Cast to prevent divide by zero, using DECIMAL to avoid FLOAT deprecation
CASE
WHEN rr.starting_price_decimal > 0 THEN
CAST(CAST(100 AS DECIMAL(20,10)) / CAST(rr.starting_price_decimal AS DECIMAL(20,10)) AS DECIMAL(20,10))
ELSE NULL
END AS previous_SP_pct,
ROW_NUMBER() OVER (PARTITION BY rr.runner_id, rr.race_type_id ORDER BY rr.scheduled_time DESC) AS race_rank
FROM RankedRaces rr
JOIN LastRaceTimes lrt ON rr.race_type_id = lrt.race_type_id
WHERE rr.scheduled_time <= lrt.last_race_time
), ProcessedRaces AS (
SELECT *,
-- Convert odds string to decimal price percentage using DECIMAL
CASE
WHEN REGEXP_LIKE(derived_previous_odds, '\\d+/\\d+') AND
CAST(SUBSTRING_INDEX(derived_previous_odds, '/', -1) AS DECIMAL(20,10)) > 0
THEN CAST(
CAST(100 AS DECIMAL(20,10)) / (
CAST(SUBSTRING_INDEX(derived_previous_odds, '/', 1) AS DECIMAL(20,10)) /
CAST(SUBSTRING_INDEX(derived_previous_odds, '/', -1) AS DECIMAL(20,10)) +
CAST(1 AS DECIMAL(20,10))
)
AS DECIMAL(20,10)
)
ELSE NULL
END AS previous_opening_price_pct
FROM FilteredRaces
)
SELECT
curr.runner_id,
curr.race_type_id,
curr.scheduled_time AS current_race_time,
curr.race_id AS current_race_id,
curr.handicap AS current_handicap,
curr.official_rating AS current_official_rating,
prev.scheduled_time AS previous_race_time,
prev.race_id AS previous_race_id,
prev.previous_starting_price,
prev.previous_starting_price_decimal,
prev.previous_position_in_betting,
prev.previous_finish_position,
prev.previous_unfinished,
prev.previous_distance_won,
prev.previous_distance_beaten,
prev.previous_official_rating,
prev.previous_handicap,
prev.previous_betting_text,
prev.derived_previous_odds,
prev.previous_SP_pct,
prev.previous_opening_price_pct,
-- Safe percentage movement calc using DECIMAL only
CASE
WHEN prev.previous_opening_price_pct > 0 THEN
CAST((
(CAST(prev.previous_SP_pct AS DECIMAL(20,10)) - CAST(prev.previous_opening_price_pct AS DECIMAL(20,10)))
/ CAST(prev.previous_opening_price_pct AS DECIMAL(20,10))
) * CAST(100 AS DECIMAL(20,10)) AS DECIMAL(20,10))
ELSE NULL
END AS previous_price_movement
FROM ProcessedRaces curr
LEFT JOIN ProcessedRaces prev
ON curr.runner_id = prev.runner_id
AND curr.race_type_id = prev.race_type_id
AND prev.race_rank = curr.race_rank + 1
WHERE curr.race_rank = 1
AND curr.scheduled_time >= DATE_SUB(CURRENT_DATE, INTERVAL 3 MONTH)
ORDER BY curr.race_type_id, curr.runner_id;
Extracting useful data from the betting string
Sunday, May 11th, 2025This post comes as a result of thinking about ways to calculate the price movements of horses, from the opening show to the SP. The opening show and other price movements aren’t currently available in Smartform as separate fields, however the opening show and other price movements are contained within the betting_text column in the historic_runners and historic_runners_beta tables, stored as a text string.
The question is how to split out the data. After a number of experiments, I came up with the following SQL to derive a usable price to go to work with. In my next post I will describe how to convert this to a decimal price and calculate the price movement from the opening show to starting price.
IFNULL( REPLACE( REGEXP_SUBSTR( betting_text, '(?<=op )\d{1,3}/\d{1,3}|(?<=tchd )\d{1,3}/\d{1,3}|(?<=op )Evens|(?<=tchd )Evens' ), 'Evens', '1/1' ), '0/0' ) AS derived_previous_odds
The above code looks into the string field called betting_text
and tries to find the previous odds of a horse from the point at which the betting shows open on course before any given race
Here’s what each part does:
REGEXP_SUBSTR(...)
:
This searches the text for the first match of one of the following patterns:- A fraction like
5/2
,11/8
, etc. that appears after the word “op “ (short for “opened at”) - A fraction that appears after the word “tchd “ (short for “touched”)
- The word
"Evens"
(which means 1/1 odds) after either “op ” or “tchd “
- A fraction like
REPLACE(..., 'Evens', '1/1')
:
If the match found was"Evens"
, this replaces it with"1/1"
to standardize it into fractional format.IFNULL(..., '0/0')
:
If nothing matched (i.e., there were no odds found in the text), it returns'0/0'
instead of null.
Examples
betting_text
= "op 5/1 tchd 6/1"
- Match found:
"5/1"
(after"op "
) - Replacement: None
- Final result:
"5/1"
betting_text
: "op Evens tchd 11/10"
- Match found:
"Evens"
(after"op "
) - Replacement:
"Evens"
→"1/1"
- Final result:
"1/1"
betting_text
= "op 85/40 tchd 2/1"
- Match found:
"85/40"
(after"op "
) - Replacement: None
- Final result:
"85/40"
betting_text
= "tchd 30/100 op 4/5"
- Match found:
"30/100"
(after"tchd "
) - Replacement: None
- Final result:
"30/100"
betting_text
= "non-runner, withdrawn at start"
- Match found: None (no “op” or “tchd” with odds or “Evens”)
- Replacement: Not applicable
- Final result:
"0/0"
(default value when nothing matches)