Welcome to the Flat 2012!
By colin on Saturday, March 31st, 2012There’s been a break from the blog this winter, but it’s now time to get back into it with the start of the Flat.
Given a race as rich as the Dubai World Cup today it seems a bit traditionalist to herald the start of the Flat with an analysis of the Lincoln Handicap, but that’s just what we’re going to do.
We wrote about Lincoln trends this time last year and not much has changed in that there is only one more year to add.
However, we can do a lot more to validate trends in terms of Smartform research. So, here’s the full query in Smartform this time around for the benefit of all subscribers to the database:
>select race_name, course, meeting_date, name, age, weight_pounds AS ‘weight’, historic_races.official_rating AS ‘OR’, trainer_name AS ‘trainer’, jockey_name AS ‘jockey’ from historic_races join historic_runners using (race_id) where race_name LIKE “%Lincoln%” and prize_money > 50000 and finish_position=1;
Notice that specifying race_name LIKE “%Lincoln%” alone (in other words retrieving all races that meet the condition of containing the word “Lincoln”) throws up many false positives, including lots of Lincoln trial races. A little lateral thinking goes a long way when working with horseracing data – here we’ve simply added the qualifying criteria that prize money must be over 75k and hey presto we get the Lincoln winners’ summary details for every year in the database, including 2006 and 2007, the years when the Lincoln was run at Redcar and Newcastle respectively.
A little trial and error is required with the prize money element in the query, but knowing the approximate value of the Lincoln helps (and if you don’t, you can find out easily enough by adding that field name to the query and seeing exactly how much added_money has gone in to Lincoln prizes in the past). Basically it’s always useful when working with racing data to get to know the ins and outs of racing as well as how to run queries on the data. Anyway, here are the results of the above query (excluding race_name):
What trends can we see?
Well, many similar to the ones we found last year, of course. Except that one of the stronger trends we identified just by looking at the winners was that 4 year olds had dominated the winners podium – and last year a six year old won it.
Let’s examine this trend a little more closely, since it’s rightly pointed out that winner trends can be misleading without considering the whole data. In other words, the statistic that 4 year olds have won the last 6 out of 9 runnings (ie. two thirds of recent renewals) would be meaningless if the number of 4 year olds running as a proportion of the whole field had also been approximately two thirds over the past 9 years.
Let’s turn to Smartform again to analyse this. What we’re looking for is the distribution of runner ages for all runners in the Lincoln over the past 5 years – how many were 4 year olds, 5 year olds and so on, and what proportion did these represent of the total.
First off, let’s see how many runners were entered over the time we are looking at:
Now, how were those runners distributed by age?
mysql> select age, count(name) AS ‘runners’ from historic_races join historic_runners using (race_id) where race_name LIKE “%Lincoln%” and added_money > 75000 group by age;
+------+---------+
| age | runners |
+------+---------+
| 4 | 65 |
| 5 | 61 |
| 6 | 49 |
| 7 | 22 |
| 8 | 8 |
| 9 | 1 |
| 10 | 2 |
+------+---------+
(Nb. The ‘group by’ function is a useful trick for interrogating data distribution by any given category.)
What conclusions can we draw from this? Does the data support the theory that 4 year olds are showing a high win ratio over the past 9 years?
Very much so – despite representing only 65 of the 208 Lincoln runners (so just under one third), they have produced two thirds of the winners.
What other trends look worth investigating further from our original query? Plenty, but let’s focus on one other – trainer. In the last 9 runnings, two trainers have won the race more than once – Mark Tompkins and William Haggas.
So let’s interrogate this “trend” to see how significant it is. (We’ll avoid the purely academic exercise in the case of Mark Tompkins, since he has no runner today.) For William Haggas, let’s see how many runners he has sent to compete in the Lincoln over the past 9 years, when they ran and where they have finished, so that we can understand his runners to winners ratio:
> select name, finish_position, scheduled_time from historic_races join historic_runners using (race_id) where race_name LIKE “%Lincoln%” and added_money > 75000 and trainer_name LIKE “%Haggas%”;
+--------------+-----------+-----------------+
| meeting_date | name | finish_position | +--------------+-----------+-----------------+ | 2007-03-31 | Very Wise | 1 | | 2008-03-22 | Very Wise | 14 | | 2010-03-27 | Penitent | 1 | +--------------+-----------+-----------------+ 3 rows in set (0.17 sec)
2 winners out of 3 runners. Not bad… William Haggas also trained High Low to win the Lincoln in 1992, before Smartform records begin. So he certainly knows how to train the winner of this race.
What 4 year olds are running in this year’s renewal, and who are their trainers? For this we turn to the automatically updated daily_race and daily_runner tables.
mysql> select forecast_price AS ‘betting’, course, meeting_date, name, weight_pounds, daily_runners.official_rating, trainer_name, jockey_name from daily_races join daily_runners using (race_id) where meeting_date = CURDATE() and race_title LIKE “%Lincoln%” and age = 4;
+———+———–+————–+———+—————+—————–+——————+
| betting | course | meeting_date | name | weight | OR | trainer_name | jockey_name |
+———+———–+————–+———+——–+——+—————-+——————-+
| 13/2 | Doncaster | 2012-03-31 | Fury | 130| 98| W J Haggas | Adam Beschizza |
| 20/1 | Doncaster | 2012-03-31 | Askaud | 129 | 97 | S Dixon | I Mongan |
| 8/1 | Doncaster | 2012-03-31 | Cocozza | 131 | 99 | M Botti | J Fanning |
+———+———–+————–+———+———+—–+—————-+——————-+
We get a Haggas runner thrown in for free… Fury, Haggas’ runner this year, is at the time of writing vying for favourtism in this year’s renewal.
The trends show us why Fury is near favourite. By the way, another trend worthy of note is that horses at the top of the market have started performing very well in recent years (as you can see in the decimal SP column of the original query table). Personally, I find it hard to back 6 /1 shots in competitive cavalry charge handicaps, and would sooner be backing something available in double figures, but historic trends suggest this one has every chance.
Postscript: Fury ran well – not the greatest passage in the race, but still came a good third.
Great stuff Colin, am going to save those queries for future big handicaps this season and some other projects
The Spring Mile winner (although a massive price) also fitted a lot of key trends for it’s respective race (although sadly went unbacked for me) but I feel this trends-based approach lends itself very well in the big races, especially when combined with other forms of analysis for confirmation. Thanks for sharing.