In the May edition of Racing Ahead, Betwise take an in-depth look at analysing in- race comments in order to spot profitable betting angles – using the Smartform Racing Database.
Lots of handicappers will look up previous in-race comments for horses that they are interested in betting on. However, using these comments is not a recognized starting point in form analysis or standardized as a way of comparing form between one horse and another.
Each race is a unique event, after all, so the story of one race is different from the story of another, and the abilities of the horses will vary. Any number of race by race factors will also affect the way a race may be run – such as the race conditions, the going, the draw, pace in the race, how the jockeys decided to ride their mounts, how the trainers and owners instructed each jockey, to name a few. Therefore an argument could be made that comments can’t be compared meaningfully across different events, still less as a means of measuring horses of different abilities.
Leaving aside these concerns, the sheer magnitude of the task should be enough to deter any further manual investigation. A modest sprint handicap of 12 runners where each runner has had an average of 20 previous runs would be 240 comments to examine for one race alone, with no standard model to work towards.
So, in the Racing Ahead article we discuss the results of analysis achieved using the flexibility and power of a programmable computer database which includes full in-race comments for each runner. In total, we examined over 7 years’ of in-running comments from Smartform for different race types in UK and Irish Flat racing – over 492,000 comments in total, representing over 45,000 individual races, for over 48,000 different runners.
The article is in two parts, with this month’s piece focusing first on the question of whether different race types are more likely or not to be won by horses who run these races in a particular way – for example leading early or being held up for a late run. Next month’s article looks at predicting how individual races may unfold depending on the known running charateristics of the runners who take part – so that the bettor can take advantage of predictive running trends in different race types.
The first thing we bump up against in trying to analyse comments is the number and variety of them. Fortunately, experienced racereaders already bring regularity to the descriptions of horses’ runs, with a common shorthand that makes it possible for us to look for key expressions rather than grouping comments in their entirety.
Here’s a random sample of a few in-race comments (familiar to anyone who ever wondered what on earth happened to their bet after the event) to illustrate the kind of data we are looking at:
mid-division, headway over 4f out, ridden over 2f out, one-paced final furlong
prominent, pushed along over 3f out, ridden 2f out, weakened final furlong
led, pushed along and headed over 4f out, soon weakened, tailed off
tracked leader, headway to lead over 4f out, pushed along and went clear 3f out, easily
Since we can see the comments have a regular structure, we can use simple SQL statements in the Smartform database to look for regular expressions or patterns of words that could be significant, based on defining comment types that seem indicative of success or failure, and/or represent a known running style.
For example, in the small sample above, it might seem interesting to match weakened, ridden and pushed along as negative indicators, and tracked leader, led and prominent as positive indicators. The downside is that choosing many categories and single words may produce poor indicators and/or contradictory indicators within the same comment.
There is some art required, then, to choosing categories that are unambiguous in terms of running style, and also independent of each other. For example, a horse that led at some point in the race does not necessarily exhibit the characteristics of a front runner, despite the fact that its comment contains the word led. So to find front runners, we would want to see the comment start with led, which is an easy condition to test for using regular expressions in SQL.
After reviewing and testing a number of regular expressions found within comments, here are the final categories we examine in the article, together with the regular expressions that they look for:
Front runner: Starts with any of led, prominent, with leader, pressed leader, disputed lead, soon led OR contains any of made all, made virtually all, always prominent.
Held up: Contains the expression held up
Lagger: Starts with any of in rear, dwelt, slowly, very slowly OR contains started slowly.
Front runner is independent of the Held up and Lagger categories. The Lagger category, however, contains a fraction of comments which contain held up. There are many other other potentially useful categories that we left out of the preliminary search since each contains overlapping categories, so it is not straightforward to talk about win percentages without considering dependencies. However, for readers of the Racing Ahead article who are curious about the other categories, here are the others we considered:
Tracked leader from start: Starts with tracked leader but not tracked leaders.
Behind leaders from start: Starts with tracked leaders but not tracked leader.
Chased leader from start: Starts with chased leader but not chased leaders.
Chased leaders from start: Starts with chased leaders but not chased leader.
In touch from start: Starts with in touch.
In touch at some point: Contains in touch.
And many more… even before looking at all explicitly negative comments such as ridden along, pushed along, weakened etc…
Suffice to say that the categories we didn’t include accounted for a far smaller percentage of total wins per category than those we did (in total, the 3 categories we chose accounted for 55% of all winning races) and the categories we didn’t choose were also were less distinctive (eg. there is a significant amount of crossover between tracking and chasing leaders whilst also being in touch).
In the article in Racing Ahead we summarize the data with comparative counts showing the number of times each comment type (consider it a running style) occurs overall, compared to the number of times horses win with that particular comment.
The winning comment types are also analysed according to different race types (eg. by distance), and by the average starting price of all winning contenders.
In short, we are interested in the highest percentage strike rates for a given running style in a pgiven type of race, and the highest returns for backing all runners who qualify from the comment type alone. We pick on one particularly promising candidate: front runners in sprint races.
You can find analysis of all the categories we analysed in this month’s Racing Ahead, and see just how promising spotting front runners in advance could be… The next issue of Racing Ahead is due out at the end of May, where we will look at how well we can predict which horses may front run in upcoming sprint races, and see how to use that information to betting advantage.
Tags: front runners, in race comments, in-running, Racing Ahead, Smartform, sprint races, SQL
This entry was posted
on Saturday, May 1st, 2010 at 8:45 pm.
You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.
Analysing in-running comments
By colin on Saturday, May 1st, 2010In the May edition of Racing Ahead, Betwise take an in-depth look at analysing in- race comments in order to spot profitable betting angles – using the Smartform Racing Database.
Lots of handicappers will look up previous in-race comments for horses that they are interested in betting on. However, using these comments is not a recognized starting point in form analysis or standardized as a way of comparing form between one horse and another.
Each race is a unique event, after all, so the story of one race is different from the story of another, and the abilities of the horses will vary. Any number of race by race factors will also affect the way a race may be run – such as the race conditions, the going, the draw, pace in the race, how the jockeys decided to ride their mounts, how the trainers and owners instructed each jockey, to name a few. Therefore an argument could be made that comments can’t be compared meaningfully across different events, still less as a means of measuring horses of different abilities.
Leaving aside these concerns, the sheer magnitude of the task should be enough to deter any further manual investigation. A modest sprint handicap of 12 runners where each runner has had an average of 20 previous runs would be 240 comments to examine for one race alone, with no standard model to work towards.
So, in the Racing Ahead article we discuss the results of analysis achieved using the flexibility and power of a programmable computer database which includes full in-race comments for each runner. In total, we examined over 7 years’ of in-running comments from Smartform for different race types in UK and Irish Flat racing – over 492,000 comments in total, representing over 45,000 individual races, for over 48,000 different runners.
The article is in two parts, with this month’s piece focusing first on the question of whether different race types are more likely or not to be won by horses who run these races in a particular way – for example leading early or being held up for a late run. Next month’s article looks at predicting how individual races may unfold depending on the known running charateristics of the runners who take part – so that the bettor can take advantage of predictive running trends in different race types.
The first thing we bump up against in trying to analyse comments is the number and variety of them. Fortunately, experienced racereaders already bring regularity to the descriptions of horses’ runs, with a common shorthand that makes it possible for us to look for key expressions rather than grouping comments in their entirety.
Here’s a random sample of a few in-race comments (familiar to anyone who ever wondered what on earth happened to their bet after the event) to illustrate the kind of data we are looking at:
mid-division, headway over 4f out, ridden over 2f out, one-paced final furlong
prominent, pushed along over 3f out, ridden 2f out, weakened final furlong
led, pushed along and headed over 4f out, soon weakened, tailed off
tracked leader, headway to lead over 4f out, pushed along and went clear 3f out, easily
Since we can see the comments have a regular structure, we can use simple SQL statements in the Smartform database to look for regular expressions or patterns of words that could be significant, based on defining comment types that seem indicative of success or failure, and/or represent a known running style.
For example, in the small sample above, it might seem interesting to match weakened, ridden and pushed along as negative indicators, and tracked leader, led and prominent as positive indicators. The downside is that choosing many categories and single words may produce poor indicators and/or contradictory indicators within the same comment.
There is some art required, then, to choosing categories that are unambiguous in terms of running style, and also independent of each other. For example, a horse that led at some point in the race does not necessarily exhibit the characteristics of a front runner, despite the fact that its comment contains the word led. So to find front runners, we would want to see the comment start with led, which is an easy condition to test for using regular expressions in SQL.
After reviewing and testing a number of regular expressions found within comments, here are the final categories we examine in the article, together with the regular expressions that they look for:
Front runner: Starts with any of led, prominent, with leader, pressed leader, disputed lead, soon led OR contains any of made all, made virtually all, always prominent.
Held up: Contains the expression held up
Lagger: Starts with any of in rear, dwelt, slowly, very slowly OR contains started slowly.
Front runner is independent of the Held up and Lagger categories. The Lagger category, however, contains a fraction of comments which contain held up. There are many other other potentially useful categories that we left out of the preliminary search since each contains overlapping categories, so it is not straightforward to talk about win percentages without considering dependencies. However, for readers of the Racing Ahead article who are curious about the other categories, here are the others we considered:
Tracked leader from start: Starts with tracked leader but not tracked leaders.
Behind leaders from start: Starts with tracked leaders but not tracked leader.
Chased leader from start: Starts with chased leader but not chased leaders.
Chased leaders from start: Starts with chased leaders but not chased leader.
In touch from start: Starts with in touch.
In touch at some point: Contains in touch.
And many more… even before looking at all explicitly negative comments such as ridden along, pushed along, weakened etc…
Suffice to say that the categories we didn’t include accounted for a far smaller percentage of total wins per category than those we did (in total, the 3 categories we chose accounted for 55% of all winning races) and the categories we didn’t choose were also were less distinctive (eg. there is a significant amount of crossover between tracking and chasing leaders whilst also being in touch).
In the article in Racing Ahead we summarize the data with comparative counts showing the number of times each comment type (consider it a running style) occurs overall, compared to the number of times horses win with that particular comment.
The winning comment types are also analysed according to different race types (eg. by distance), and by the average starting price of all winning contenders.
In short, we are interested in the highest percentage strike rates for a given running style in a pgiven type of race, and the highest returns for backing all runners who qualify from the comment type alone. We pick on one particularly promising candidate: front runners in sprint races.
You can find analysis of all the categories we analysed in this month’s Racing Ahead, and see just how promising spotting front runners in advance could be… The next issue of Racing Ahead is due out at the end of May, where we will look at how well we can predict which horses may front run in upcoming sprint races, and see how to use that information to betting advantage.
Tags: front runners, in race comments, in-running, Racing Ahead, Smartform, sprint races, SQL
This entry was posted on Saturday, May 1st, 2010 at 8:45 pm. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.