Archive for April, 2018

Calculating Strike Rate using Smartform and R

Saturday, April 28th, 2018

The simple goal of this article is to show how to calculate jockey Frankie Dettori’s win strike rate at Ascot since 2009, using the Smartform database and the R statistical programming language.

The article begins with the assumption that the Smartform MySQL database has been successfully installed following the instructions provided in the user manual. Some familiarity with basic programming concepts is assumed and the reader will most likely be required to conduct some of their own research to understand some aspects of the R language.

The R language may be installed directly from the project’s website as a binary for Linux, Mac and Windows platforms. Many Linux distributions also include the R language in their repositories for installation via a package manager.

Once R is successfully installed, it is then recommended to install the RStudio integrated development environment (IDE). This will assist to make writing R code easier and is more user friendly than working with just the R command prompt. RStudio provides many useful tools and frameworks for R. The IDE is available as a free download from their website.

Once R and RStudio IDE is installed, it’s time to start writing code.

The first step is to retrieve data from the installed Smartform MySQL database. An additional R library is required for this. The easiest SQL connection library for our purposes is RMySQL. The following code demonstrates the steps required to install this library. Simply cut and paste this code snippet into the Console window in RStudio.

install.packages("RMySQL")

This step may also be achieved by selecting Tools / Install Packages from the RStudio menu options. If all went well RMySQL and its dependencies should have been successfully installed.

Next step is to connect to the Smartform database. This may have been installed locally or on a remote server.

# Load the RMySQL library package
library("RMySQL")

# Connect to the Smartform database. Substitute the placeholder credentials for your own. 
# The IP address can be substituted for a remote location if appropriate.
con <- dbConnect(MySQL(), 
                        host='127.0.0.1', 
                        user='yourusername', 
                        password='yourpassword', 
                        dbname='smartform')

There are a number of different ways to use the RMySQL package, in terms of the dbConnect arguement. The manual for this package, which can be found at Github or as a PDF from the CRAN package homepage, describes creating configuration files and additional options.

Once connection to the Smartform database is established, the next step is to retrieve some data. The SQL query language has many options and is very powerful. However, this article will endeavour to keep things simple.

# This SQL query selects the required columns from the historic_races and historic_runners tables, joining by unique race_id and filtering for results only since January 1st, 2009. The SQL query is saved in a variable called sql1.
sql1 <- paste("SELECT historic_races.course, 
              historic_runners.jockey_name, 
              historic_runners.finish_position 
              FROM smartform.historic_runners
              JOIN smartform.historic_races USING (race_id)
              WHERE historic_races.meeting_date >= '2009-01-01'", sep="")

# Now execute the SQL query, using the previously established connection. Results are stored in a variable called smartform_results.
smartform_results <- dbGetQuery(con, sql1)

# Close the database connection, which is good practice
dbDisconnect(con)

If all went well, there will now be a new R dataframe containing in excess of 118,000 results, which is every ride for every jockey at every course, since January 1st 2009. This data can now be filtered to retain just Dettori’s rides at Ascot. There are often multiple ways to achieve the same result when using R. We’ll load another R library called dplyr (which will need to be installed first) in order to again keep things simple where possible.

# Load the dplyr library
library("dplyr")

# Filter the long list of all courses for Ascot only
ascot_only <- dplyr::filter(smartform_results,
                               course %in% c("Ascot"))

# Filter all Ascot runs for Dettori's rides only
dettori_rides <- dplyr::filter(ascot_only,
                            jockey_name %in% c("L Dettori"))

A few assumptions have been made with the above code. Firstly, we know exactly how Dettori’s name appears in the database, so we can match on the exact string. We also haven’t differentiated between Flat and National Hunt racing at Ascot, because Frankie Dettori only rides on the Flat.

The data contained in the dataframe dettori_rides now contains every Dettori ride at Ascot since 2009. Next step is to calculate strike rate. This is a very simple calculation where total wins are divided by total rides and multiplied by one hundred, to obtain a percentage.

# Count total rides for Dettori at Ascot
total_rides <- nrow(dettori_rides)

# Filter for only winning rides. i.e. finish position is 1
dettori_wins <- dplyr::filter(dettori_rides,
                              finish_position == 1)

# Count total winning rides
total_wins <- nrow(dettori_wins)

# Calculate strike rate
dettori_strike_rate <- (total_wins / total_rides) * 100

# Display strike rate
dettori_strike_rate
# 14.90683

At the time of writing this article, Frankie Dettori’s total win strike rate at Ascot, since 2009, is 14.91%

This concludes the current article. In future posts, additional statistics will be calculated and a deeper understanding of the R language obtained.

Questions and queries about this article should be posted as a comment below or on the Betwise Q&A board.

The full R code used in this article is found below.

# Load the required library packages
library("RMySQL")
library("dplyr")


# Connect to the Smartform database. Substitute the placeholder credentials for your own. 
# The IP address can be substituted for a remote location if appropriate.
con <- dbConnect(MySQL(), 
                 host='127.0.0.1', 
                 user='yourusername', 
                 password='yourpassword', 
                 dbname='smartform')

# This SQL query selects the required columns from the historic_races and historic_runners tables, joining by unique race_id and filtering for results only since January 1st, 2009. The SQL query is saved in a variable called sql1.
sql1 <- paste("SELECT historic_races.course, 
              historic_runners.jockey_name, 
              historic_runners.finish_position 
              FROM smartform.historic_runners
              JOIN smartform.historic_races USING (race_id)
              WHERE historic_races.meeting_date >= '2009-01-01'", sep="")

# Now execute the SQL query, using the previously established connection. Results are stored in a variable called smartform_results.
smartform_results <- dbGetQuery(con, sql1)

# Close the database connection, which is good practice
dbDisconnect(con)

# Filter the long list of all courses for Ascot only
ascot_only <- dplyr::filter(smartform_results,
                            course %in% c("Ascot"))

# Filter all Ascot runs for Dettori's rides only
dettori_rides <- dplyr::filter(ascot_only,
                               jockey_name %in% c("L Dettori"))

# Count total rides for Dettori at Ascot
total_rides <- nrow(dettori_rides)

# Filter for only winning rides. i.e. finish position is 1
dettori_wins <- dplyr::filter(dettori_rides,
                              finish_position == 1)

# Count total winning rides
total_wins <- nrow(dettori_wins)

# Calculate strike rate
dettori_strike_rate <- (total_wins / total_rides) * 100

# Display strike rate
dettori_strike_rate
# 14.90683