Betwise Blog
Betwise news, analysis and automatic betting info

Calculating Strike Rate using Smartform and R

By Phill Clarke on Saturday, April 28th, 2018

The simple goal of this article is to show how to calculate jockey Frankie Dettori’s win strike rate at Ascot since 2009, using the Smartform database and the R statistical programming language.

The article begins with the assumption that the Smartform MySQL database has been successfully installed following the instructions provided in the user manual. Some familiarity with basic programming concepts is assumed and the reader will most likely be required to conduct some of their own research to understand some aspects of the R language.

The R language may be installed directly from the project’s website as a binary for Linux, Mac and Windows platforms. Many Linux distributions also include the R language in their repositories for installation via a package manager.

Once R is successfully installed, it is then recommended to install the RStudio integrated development environment (IDE). This will assist to make writing R code easier and is more user friendly than working with just the R command prompt. RStudio provides many useful tools and frameworks for R. The IDE is available as a free download from their website.

Once R and RStudio IDE is installed, it’s time to start writing code.

The first step is to retrieve data from the installed Smartform MySQL database. An additional R library is required for this. The easiest SQL connection library for our purposes is RMySQL. The following code demonstrates the steps required to install this library. Simply cut and paste this code snippet into the Console window in RStudio.

install.packages("RMySQL")

This step may also be achieved by selecting Tools / Install Packages from the RStudio menu options. If all went well RMySQL and its dependencies should have been successfully installed.

Next step is to connect to the Smartform database. This may have been installed locally or on a remote server.

# Load the RMySQL library package
library("RMySQL")

# Connect to the Smartform database. Substitute the placeholder credentials for your own. 
# The IP address can be substituted for a remote location if appropriate.
con <- dbConnect(MySQL(), 
                        host='127.0.0.1', 
                        user='yourusername', 
                        password='yourpassword', 
                        dbname='smartform')

There are a number of different ways to use the RMySQL package, in terms of the dbConnect arguement. The manual for this package, which can be found at Github or as a PDF from the CRAN package homepage, describes creating configuration files and additional options.

Once connection to the Smartform database is established, the next step is to retrieve some data. The SQL query language has many options and is very powerful. However, this article will endeavour to keep things simple.

# This SQL query selects the required columns from the historic_races and historic_runners tables, joining by unique race_id and filtering for results only since January 1st, 2009. The SQL query is saved in a variable called sql1.
sql1 <- paste("SELECT historic_races.course, 
              historic_runners.jockey_name, 
              historic_runners.finish_position 
              FROM smartform.historic_runners
              JOIN smartform.historic_races USING (race_id)
              WHERE historic_races.meeting_date >= '2009-01-01'", sep="")

# Now execute the SQL query, using the previously established connection. Results are stored in a variable called smartform_results.
smartform_results <- dbGetQuery(con, sql1)

# Close the database connection, which is good practice
dbDisconnect(con)

If all went well, there will now be a new R dataframe containing in excess of 118,000 results, which is every ride for every jockey at every course, since January 1st 2009. This data can now be filtered to retain just Dettori’s rides at Ascot. There are often multiple ways to achieve the same result when using R. We’ll load another R library called dplyr (which will need to be installed first) in order to again keep things simple where possible.

# Load the dplyr library
library("dplyr")

# Filter the long list of all courses for Ascot only
ascot_only <- dplyr::filter(smartform_results,
                               course %in% c("Ascot"))

# Filter all Ascot runs for Dettori's rides only
dettori_rides <- dplyr::filter(ascot_only,
                            jockey_name %in% c("L Dettori"))

A few assumptions have been made with the above code. Firstly, we know exactly how Dettori’s name appears in the database, so we can match on the exact string. We also haven’t differentiated between Flat and National Hunt racing at Ascot, because Frankie Dettori only rides on the Flat.

The data contained in the dataframe dettori_rides now contains every Dettori ride at Ascot since 2009. Next step is to calculate strike rate. This is a very simple calculation where total wins are divided by total rides and multiplied by one hundred, to obtain a percentage.

# Count total rides for Dettori at Ascot
total_rides <- nrow(dettori_rides)

# Filter for only winning rides. i.e. finish position is 1
dettori_wins <- dplyr::filter(dettori_rides,
                              finish_position == 1)

# Count total winning rides
total_wins <- nrow(dettori_wins)

# Calculate strike rate
dettori_strike_rate <- (total_wins / total_rides) * 100

# Display strike rate
dettori_strike_rate
# 14.90683

At the time of writing this article, Frankie Dettori’s total win strike rate at Ascot, since 2009, is 14.91%

This concludes the current article. In future posts, additional statistics will be calculated and a deeper understanding of the R language obtained.

Questions and queries about this article should be posted as a comment below or on the Betwise Q&A board.

The full R code used in this article is found below.

# Load the required library packages
library("RMySQL")
library("dplyr")


# Connect to the Smartform database. Substitute the placeholder credentials for your own. 
# The IP address can be substituted for a remote location if appropriate.
con <- dbConnect(MySQL(), 
                 host='127.0.0.1', 
                 user='yourusername', 
                 password='yourpassword', 
                 dbname='smartform')

# This SQL query selects the required columns from the historic_races and historic_runners tables, joining by unique race_id and filtering for results only since January 1st, 2009. The SQL query is saved in a variable called sql1.
sql1 <- paste("SELECT historic_races.course, 
              historic_runners.jockey_name, 
              historic_runners.finish_position 
              FROM smartform.historic_runners
              JOIN smartform.historic_races USING (race_id)
              WHERE historic_races.meeting_date >= '2009-01-01'", sep="")

# Now execute the SQL query, using the previously established connection. Results are stored in a variable called smartform_results.
smartform_results <- dbGetQuery(con, sql1)

# Close the database connection, which is good practice
dbDisconnect(con)

# Filter the long list of all courses for Ascot only
ascot_only <- dplyr::filter(smartform_results,
                            course %in% c("Ascot"))

# Filter all Ascot runs for Dettori's rides only
dettori_rides <- dplyr::filter(ascot_only,
                               jockey_name %in% c("L Dettori"))

# Count total rides for Dettori at Ascot
total_rides <- nrow(dettori_rides)

# Filter for only winning rides. i.e. finish position is 1
dettori_wins <- dplyr::filter(dettori_rides,
                              finish_position == 1)

# Count total winning rides
total_wins <- nrow(dettori_wins)

# Calculate strike rate
dettori_strike_rate <- (total_wins / total_rides) * 100

# Display strike rate
dettori_strike_rate
# 14.90683

No Comments (add your own) »

Cheltenham – Final Day

By colin on Thursday, March 13th, 2014

Thank goodness it’s the last day.  I am a Cheltenham traditionalist, so technically this last day should not even exist, the Gold Cup should already have been run, and I should not have to blog having arrived home late on a bumpy taxi ride.  Never again – roll on the month long breaks between posts of the past.  But once you’ve started a Cheltenham selections blog, it’s just not on to stop – even if nobody is reading!

Anyway, I digress.  Cheltenham selections for the final day, as powered by the wonder of Smartform, are:

13:30: Guitar Pete
14:05: Diakali
14:40: Briar Hill
15:20: Bobs Worth
16:00: Harbour Court
16:40: Une Artiste
17:15: Oiseau de Nuit

All we need at this stage is just one decent priced winner to put the system in LSP, and not a bad strike rate to boot, for the whole of the Festival.

Good luck!
 

 

No Comments (add your own) »

Cheltenham Day 3

By colin on Wednesday, March 12th, 2014

For our 1 reader, you must be getting fed up with this by now 🙂

Though the overall results are still not too shabby, despite only 1 winner today.  Tomorrow, the automated Cheltenham system has thrown up a fair few at decent prices.  Here goes, and fingers crossed:

13:30:  Double Ross
14:05:  Fingal Bay
14:40:  Boston Bob
15:20:  Quevega
16:00:  Sraid Padraig
16:40:  Same Difference

And that’s your lot…

 

No Comments (add your own) »

Cheltenham Day 2

By colin on Tuesday, March 11th, 2014

So today we picked 2 winners, 3 places (inc. 20-1 second) and 2 unplaced.

And coming up the automated picks for tomorrow’s racing (but first a word from our sponsor):

I wrote this program for Cheltenham horses about 4 years ago.  It takes about 2 minutes to run over all the form in Smartform and most years returns a level stakes profit. Over all years, its profitable to LSP. Oh, the wonders of Smartform. Or more accurately knowing a bit about horseracing and having at your disposal a programmable database.

So here goes for tomorrow’s picks (actually takes longer to write the blog post than pick the horses, can you believe it?):

13:30: Royal Boy
14:05: Carlingford Lough
14:40: Dell’ Arca
15:20: Sire De Grugy
16:00: Quantitativeeasing
16:40: Orgilgo Bay
17:15: Modus

Best of luck to you all, and, if you want me to run the program again tomorrow, post a comment or retweet to let us know you’re listening 🙂

And best of luck to you all…

No Comments (add your own) »

Happy Cheltenham!

By colin on Monday, March 10th, 2014

Well, what better way to awake to Spring than the Cheltenham festival?

I’ve followed a Cheltenham system based on data only programmatically accessible through Smartform for the past few years that has always yielded an LSP. It tends to spot a couple of longshots through the meeting, so it’s not to be relied on. But for fun, here goes selections for Tuesday:

13:30 Vautour
14:05 Trifolium
14:40 Ma Filleule
15:20 Hurricane Fly
16:00 Quevega
16:40 Shotgun Paddy
17:15 Art of Logistics

Have fun!

No Comments (add your own) »

Twitter updates for 2014

By colin on Sunday, January 5th, 2014

Hi there,

And Happy New Year to all of you!

There’ll be a few improvements to Betwise in the year ahead, and we’ll be starting with Twitter service updates for Smartform subscribers.

We’ve been trialling these for a while, now we’ll be adding in the date.  Basically for all the daily subscribers the tweets will show you when the historic results for the previous day’s racing have been updated and with what meetings, and likewise for advance daily cards, when these have been updated and for what meetings.

Hope you like this, and please don’t forget to follow us at www.twitter.com/betwiser for more in the year ahead.

No Comments (add your own) »

Pace in the Coral Sprint Trophy, Newmarket

By colin on Saturday, May 18th, 2013

Following on from last week’s post about visualising pace in the race, here’s another pace chart painstakingly crafted for today’s big sprint at Newmarket.

As with Ascot last week, you can see that at Newmarket the horses run from right to left if we are looking from the stands side (the usual television angle) of the course.  Thus Hasopop is in stall 16 and BlueGrass Blues, running towards the far (or “inside”) rail, is in stall 1.

Non-runners Lewisham and Heaven’s Guest (at the time of posting) have been removed.

So the pace in the race is definitely with Bapak Sayang towards the stands rail.  This colt has a great draw for a front runner in stall 15 in that he may be able to bag the rail early to run against and dictate the pace of the race.  The bad news is that Newmarket’s wide, straight galloping course has not got the best strike rate for front runners with ambitions to win.  There also has to be a caveat that this horse has only had 5 runs so it might be a little early in his career to label him a persistent front runner.

If you like the graph, or anything needs clarifying, please let us know in the comments section.

No Comments (add your own) »

Pace in the Victoria Cup – because a picture is worth a thousand words…

By colin on Saturday, May 11th, 2013

Check out this pace graphic (just one of the things it’s possible to do with Smartform and a little programming).

Every day we produce leader and lagger statistics in the Betwise Members’ area – this graphic shows the difference between the two as “net lead” – if a horse is in the negative part of the graph, we expect them to start slowly, a horse in the positive part of the graph shows a probable early leader, and the extent to which they will lead.  Everything else is as you will expect to see in the race – horses are ordered according to the draw, from Solar Deity on the stands rail in stall 1 to Born to Surprise racing in stall 29 on the near side rail.   Ascot’s a right handed course, so the horses are racing from right to left if you look at the course from above.  Therefore this is like an aerial view of the racecourse and shows us which horse(s) we might expect to be leading after the first furlong of the race, where the horse’s position is represented by the blue dot.  Of course, the dynamics of the race mean that the horses will start to bunch and will not stay in “lanes” once they have started, also it’s usually the horse that actually gets the lead that matters most (our prediction is Dream Tune in this case).

If you like this graphic or would like to see anything else explained or added to it, please let us know in the comments section.

You can draw your own conclusions from the graphic but here are my thoughts on inplay bets:

The early pace looks to be towards the stands side, with Dream Tune looking highly likely to secure an early lead.  A good back to lay candidate?

No Comments (add your own) »

New automated betting strategy for sprinters added to site

By colin on Saturday, September 1st, 2012

A new automated betting strategy has been added to the members’ area on the site.

It runs in real-time every day considering all sprint races in the UK and Ireland, using the same techniques described in Automatic Exchange Betting to select and schedule races automatically.

This robot considers each Flat sprint race (ie. 5 and 6 furlong races) 1 minute before the offtime and assesses all contenders against variables derived entirely from the Smartform database. Eight variables are used, each converted to a numerical value, then a simple average is taken of all. The robot simply backs the top rated from these averages. It could be a lot more sophisticated (the variables could be weighted or could form inputs to a neural network). Further a value price could be derived from backtested results (or any other method of determining a price) and bets made only on the contenders that exceed the value price or exceed it by some margin.

However, the top rated average on its own produces a recommendation per race and is at least interesting enough in recent backtesting to show it live on the site to £2 stakes. If you catch it just before the race from the link at this page you will see the recommendations as they are bet, together with results and recommendations from any previous sprint races on the day.

No Comments (add your own) »

Analysing previous performances on heavy going

By colin on Saturday, June 16th, 2012

In the wake of the recent, somewhat unseasonal, downpours in the runup to Midsummer’s Day, the going around the country has been changed to include the terms “Soft” and “Heavy”.  In the case of Leicester, we can add “Abandonned:  Waterlogged”.

Such unseasonal changes are always liable to mess with the form book.  One of the tasks of the form student is to establish what previous form, if any, has been shown on such going.

In Smartform, we can translate this into a query on any particular runner, in order to find all its previous runs on similar going.   By way of example, we’ll pick a race at Sandown today, since it is currently supposed to be riding “Soft, Heavy in places”.  The race we’ll look at is the 2.50 at Sandown, a handicap over one mile.

First, let’s look at how to pull a runner’s entire history (in this case, Leviathan) from Smartform, in order to look at its career runs, as follows:

mysql> select scheduled_time, course, weight_pounds as ‘weight’, going, trainer_name, jockey_name, distance_yards as ‘distance’, num_runners as ‘ran’, finish_position as ‘finished’, unfinished from historic_races join historic_runners using (race_id) where name=”Leviathan”;

Smartform subscribers can try this one at home, since the table of results is rather wide to reprint in the blog (of course we can include any of the variables available in Smartform within the returned results, this just shows us the most pertinent basic information in order to see if Leviathan has shown any promise on softer going).

We can see from the results, however, that Leviathan has never raced on going which has included the word “Heavy” in the description.    However, he has raced once on proper Soft going (proper in the sense that it wasn’t “Good to Soft”, for example), where he finished first of 13 runners.  That was almost exactly one year ago,

If we had wanted just to cut to the chase and see runs only containing the terms “Heavy” we could have altered the query with a condition for the going description.

Let’s try that for another runner in the race, Sam Sharp, as follows:

mysql> select scheduled_time, course, weight_pounds as ‘weight’, going, trainer_name, jockey_name, distance_yards as ‘distance’, num_runners as ‘ran’, finish_position as ‘finished’, unfinished from historic_races join historic_runners using (race_id) where name=”Sam Sharp” and going LIKE “%Heavy%”;

This time, only one result is returned.  A second out of 4 runners in a recent race over course and distance.  The horse appeared to handle the going, being beaten only narrowly. However, we’ll find that it’s quite rare that horses have had many runs on heavy going, so that most queries which restrict results to runs only including the term “Heavy” on these runners will produce the results “Empty Set” (as with Leviathan).  Where a run is returned, as in the case of Sam Sharp above, the sample size is usually so small that even if the run was not good, it is hard to say that the horse did not act on the going.  We can say the reverse, however, where the performance in a run has indicated that a horse acted on the going.

Producing an individual query such as the above for each runner in a race is somewhat tedious of course, so this is exactly the type of query that can be automated for a whole race, dispalying all runners’ previous performances on heavy, for example, in one move, using a simple programming language, such as R or Perl.  For some examples of using R with Smartform in this way, see the last Chapter of our free ebook, “BetfaiR Trading with R”.

No Comments (add your own) »