Betwise Blog
Betwise news, analysis and automatic betting info

Daily trainer strike rates – how to match for all today’s runners

By Nick Franks on Saturday, May 12th, 2018

Today we’re going to build on the last SQL post where we created a query for 14 day strike rates for all trainers who had runners in the past 14 days. It’s all very well knowing how trainers have performed in the last 14 days, but in practice the way we’ll typically want to use that information is to match the strike rates with today’s runners, so we can compare trainer strike rates within a race.

That is what today’s query is going to show. We’ll be converting the query from the last blog post to a new database table, then querying that table with a left outer join combined with data on today’s runners. First, here’s the query that you can run in Smartform yourself, then we’ll explain what is going on.

-- If the temp table exists delete it

DROP TABLE IF EXISTS temp_trainer_stats_table;

-- create the temporary table by using the first query
CREATE TABLE temp_trainer_stats_table AS (
Select a.*,
ROUND(Winners/Runners * 100,0) AS WinPct,
ROUND(Placers/Runners * 100,0) AS PlacePct
from(
SELECT hru.trainer_name as Trainer, hru.trainer_id,
COUNT(*) AS Runners,
SUM(CASE WHEN hru.finish_position = 1 THEN 1 ELSE 0 END) AS Winners,
sum(case when hra.num_runners < 8 then case when hru.finish_position in ( 2) then 1 else 0 end
else
case when hra.num_runners < 16 then case when hru.finish_position in ( 2,3) then 1 else 0 end
else
case when hra.handicap = 1 then case when hru.finish_position in (2,3,4) then 1 else 0 end
else
case when hru.finish_position in ( 1,2,3) then 1 else 0 end
end
end
end )as Placers,
ROUND(((SUM(CASE WHEN hru.finish_position = 1 THEN (hru.starting_price_decimal -1) ELSE -1 END))),2) AS WinProfit
FROM historic_runners hru
JOIN historic_races hra USING (race_id)
WHERE hra.meeting_date >= ADDDATE(CURDATE(), INTERVAL -14 DAY)
and hra.race_type_id in( 15, 12)
and hru.in_race_comment <> 'Withdrawn'
and hru.starting_price_decimal IS NOT NULL
GROUP BY trainer_name, trainer_id) a);

-- Join the daily runners and daily races to get details of todays card, then do a LEFT OUTER JOIN to get the trainer stats,
-- Where no trainer stats exist NULLS are shown

SELECT substr(dra.scheduled_time, 12,5) as Time, dra.course as Course,
dru.cloth_number as 'No.', dru.name as Horse,
case when dru.stall_number is NULL then "" ELSE dru.stall_number end as Draw, dru.jockey_name as Jockey,
dru.forecast_price as FSP, tmp.*
from daily_races dra
join daily_runners dru using (race_id)
LEFT OUTER JOIN temp_trainer_stats_table tmp using (trainer_id )
where dra.race_type = 'Flat'
and dra.meeting_date = adddate(curdate(), INTERVAL 0 DAY)
-- add a condition here if you want to restrict stats for trainers with less than a certain number of runners
-- eg. [ and runners >=5 ]
order by Time, dru.cloth_number;

Having created a temporary table with the trainer statistics, we now want to join this data to the race card.
The race card data is held in two tables – daily_races and daily runners – which we will join together using the race_id

from daily_races dra
join daily_runners dru using (race_id)

To this we also want to the join the temp_trainer_stats_table we have just created – for this we use a LEFT OUTER JOIN.

This type of join ensures we have the data for every runner in every race on the card, and the trainer data for trainers for which it exists.

For each table I use an identifier eg. Dra for daily_races, tmp for the temp_trainer_stats_table, this enables correct identification of data fields which exist in more than one table.

There is far more data available in the race card data that I am using here, but the basics are:

Time – for this I am using a substring function of the scheduled_time data element, the whole value is for example 2018-05-06 13:30:00 using substr(dra.scheduled_time, 12,5). I am using 5 characters from position 12 which gives me 13:30.
We could also use this to extract the date. There is also a meeting_data data element. Using either of them you can create a date field using a substring and concatenation function like this:

concat(substr(dra.scheduled_time, 9,2),'-',substr(dra.scheduled_time, 6,2),'-',substr(dra.scheduled_time, 1,4)) as RaceDate

Added to the join of the race card data and the trainer data are two conditions to get only flat races for today

where dra.race_type  = 'Flat' 
and dra.meeting_date = adddate(curdate(), INTERVAL 0 DAY)

Further criteria could be added to limit the selection to trainers with more than a certain number of runners (eg greater than 4) in the 14 day analysed period

and  tmp.runners > 4

and with a strike race greater or equal to 20%

and WinPct >=20

combinations of win percentage, win profit and number of runners would look like this

where dra.race_type  = 'Flat'
and dra.meeting_date = adddate(curdate(), INTERVAL 0 DAY)
and tmp.runners > 4
and tmp.WinPct >= 20
and tmp.WinProfit > 0

Running the first query gives us a result set of 488 rows, neatly ordered by rate time for each of today’s races. We’ve attached a CSV of the output for all today’s races, so you can analyse each race according to trainer strike rate and profitability – something you can of course do for yourself every day in Smartform.

CSV Download: 14 day trainer strike rates for every race on Saturday 12th May

In the next post, we’ll look at different options for automating this query on a daily basis.

No Comments (add your own) »

Jockey SR% in Group Races and Today’s 1000 Guineas

By Phill Clarke on Sunday, May 6th, 2018

Last week we looked at a very simple use of Smartform and R to calculate a single jockey’s strike rate. While a decent introduction, the outcome was not particularly useful. Today we’ll extend this example, in a topical manner, to calculate the Group race strike rate of every jockey riding in the 1000 Guineas at Newmarket.

To begin, load the relevant R libraries and execute an SQL query to return some historic data:

# Load the RMySQL and dplyr library packages
library("RMySQL")
library("dplyr")

# Execute an SQL command to return some historic data
# Connect to the Smartform database. Substitute the placeholder credentials for your own. 
# The IP address can be substituted for a remote location if appropriate.
con <- dbConnect(MySQL(), 
                        host='127.0.0.1', 
                        user='yourusername', 
                        password='yourpassword', 
                        dbname='smartform')

# This SQL query selects the required columns from the historic_races and historic_runners tables, joining by unique race_id and filtering for results only since January 1st, 2006. The SQL query is saved in a variable called sql1.
sql1 <- paste("SELECT historic_races.course,
              historic_races.conditions,
              historic_races.group_race,
              historic_races.race_type_id,
              historic_races.race_type,
              historic_runners.name,
              historic_runners.jockey_name,
              historic_runners.finish_position 
              FROM smartform.historic_runners
              JOIN smartform.historic_races USING (race_id)
              WHERE historic_races.meeting_date >= '2006-01-01'", sep="")

# Now execute the SQL query, using the previously established connection. Results are stored in a variable called smartform_results.
smartform_results <- dbGetQuery(con, sql1)

# Close the database connection, which is good practice
dbDisconnect(con)

The variable smartform_results now contains a large selection of historic data, which can be used to filter and calculate jockey strike rates. In R there are always multiple methods to achieve the same result. The example code used in this article tries to keep things simple, laying out the steps in a logical manner. There are cleaner, faster and perhaps better ways to achieve the goal, some of which will be explored in future posts.

Next, filter for just flat Group races. The vertical bar, |, in the code below corresponds to an OR statement in SQL:

# Race type IDs 12 and 15 correspond to Flat and All Weather races
flat_races_only <- dplyr::filter(smartform_results,
                                  race_type_id == 12 |
                                   race_type_id == 15)

# Filter using the group_race field for Group 1, 2 and 3 races only
group_races_only <- dplyr::filter(flat_races_only,
                                group_race == 1 |
                                  group_race == 2 |
                                  group_race == 3 )

The variable group_races_only now contains details of every Flat and All Weather Group race in the Smartform database since January 1st, 2006.

Using this data, the next step is to calculate strike rates of every jockey who has ever ridden in a Group race. The strange looking symbol linking variables in the code below is a pipe command from the dplyr library:

# For each jockey name, count the number of rides
jockey_group_rides <- group_races_only %>% count(jockey_name)

# Rename the second column in jockey_group_rides to something more logical
names(jockey_group_rides)[2]<-"group_rides"

# Now filter for only winning rides
group_winners_only <- dplyr::filter(group_races_only,
                              finish_position == 1)

# For each jockey, count the number of winning rides
jockey_group_wins <- group_winners_only %>% count(jockey_name)

# Rename the second column in jockey_group_wins to something more logical
names(jockey_group_wins)[2]<-"group_wins"

# Join the two dataframes, jockey_group_rides and jockey_group_wins together, using jockey_name as a key
jockey_group_data <- dplyr::full_join(jockey_group_rides, jockey_group_wins, by = "jockey_name")

# Rename all the NA fields in the new dataframe to zero
# If a jockey has not ridden a group winner, the group_wins field will be NA
# If this is not changed to zero, later calculations will fail
jockey_group_data[is.na(jockey_group_data)] <- 0

# Now calculate the Group race strike rate for all jockeys
jockey_group_data$strike_rate <- (jockey_group_data$group_wins / jockey_group_data$group_rides) * 100

The variable jockey_group_data now contains the strike rate for every jockey who has ridden in a flat or all weather Group race since January 1st, 2006.

While this may be interesting data, it is perhaps not particularly useful on its own. There are some jockeys with a 100% strike rate, from just one ride.

Next step is to apply this data to today’s 1000 Guineas at Newmarket. To begin, another SQL query is required to return daily racing data.

# Execute an SQL command to return some historic data
# Connect to the Smartform database. Substitute the placeholder credentials for your own. 
# The IP address can be substituted for a remote location if appropriate.
con <- dbConnect(MySQL(), 
                        host='127.0.0.1', 
                        user='yourusername', 
                        password='yourpassword', 
                        dbname='smartform')

# This SQL query selects the required columns from the daily_races and daily_runners tables, joining by unique race_id and filtering for results only with today's date. The SQL query is saved in a variable called sql2.                        
sql2 <- paste("SELECT daily_races.course,
              daily_races.race_title,
              daily_races.meeting_date,
              daily_runners.name,
              daily_runners.jockey_name
              FROM smartform.daily_races
              JOIN smartform.daily_runners USING (race_id)
              WHERE daily_races.meeting_date >='2018-05-06'", sep="")

# Now execute the SQL query, using the previously established connection. Results are stored in a variable called smartform_daily_results.
smartform_daily_results <- dbGetQuery(con, sql2)

# Close the database connection, which is good practice
dbDisconnect(con)

The variable smartform_daily_results contains information about all races being run today.

The next step is to filter just for the 1000 Guineas. To do this, a grep command is used, which conducts a string match on the race_title field.

# Filter for the 1000 Guineas only
guineas_only <- dplyr::filter(smartform_daily_results,
                              grepl("1000 Guineas", race_title))

The variable guineas_only now contains just the basic details for today’s 1000 Guineas.

Lastly, dplyr is again used to perform a different type of join, which combines the guineas_only and jockey_group_data dataframes, using jockey_name as the key.

# Using dplyr, join the guineas_only and jockey_group_data dataframes
guineas_only_with_sr <- dplyr::inner_join(guineas_only, jockey_group_data, by = "jockey_name")

This results in a new dataframe, guineas_only_with_sr, which contains three new columns, group_rides, group_wins and strike_rate.

It should be no surprise that Ryan Moore has the highest strike rate, from a very large sample size of rides, followed by Frankie Dettori, also with a solid number of rides. Jockeys such as Jim Crowley and Silvestre De Sousa should be carefully considered, as they both have a significant sample size of rides, but quite a low winning strike rate.

> guineas_only_with_sr

      course                                   race_title meeting_date            name   jockey_name group_rides group_wins strike_rate
1  Newmarket Qipco 1000 Guineas Stakes (Fillies' Group 1)   2018-05-06      Altyn Orda     L Dettori         980        160   16.326531
2  Newmarket Qipco 1000 Guineas Stakes (Fillies' Group 1)   2018-05-06     Anna Nerium  Tom Marquand          18          1    5.555556
3  Newmarket Qipco 1000 Guineas Stakes (Fillies' Group 1)   2018-05-06 Billesdon Brook     S M Levey         130          5    3.846154
4  Newmarket Qipco 1000 Guineas Stakes (Fillies' Group 1)   2018-05-06     Dan's Dream    S De Sousa         316         23    7.278481
5  Newmarket Qipco 1000 Guineas Stakes (Fillies' Group 1)   2018-05-06         Happily     R L Moore        1281        250   19.516003
6  Newmarket Qipco 1000 Guineas Stakes (Fillies' Group 1)   2018-05-06       I Can Fly J A Heffernan         571         62   10.858144
7  Newmarket Qipco 1000 Guineas Stakes (Fillies' Group 1)   2018-05-06         Laurens    P McDonald          37          5   13.513514
8  Newmarket Qipco 1000 Guineas Stakes (Fillies' Group 1)   2018-05-06    Liquid Amber       W J Lee         164         13    7.926829
9  Newmarket Qipco 1000 Guineas Stakes (Fillies' Group 1)   2018-05-06        Madeline Andrea Atzeni         357         52   14.565826
10 Newmarket Qipco 1000 Guineas Stakes (Fillies' Group 1)   2018-05-06       Sarrocchi    W M Lordan         357         42   11.764706
11 Newmarket Qipco 1000 Guineas Stakes (Fillies' Group 1)   2018-05-06        Sizzling     D O'Brien          98         10   10.204082
12 Newmarket Qipco 1000 Guineas Stakes (Fillies' Group 1)   2018-05-06       Soliloquy       W Buick         338         44   13.017751
13 Newmarket Qipco 1000 Guineas Stakes (Fillies' Group 1)   2018-05-06         Vitamin   Jim Crowley         471         39    8.280255
14 Newmarket Qipco 1000 Guineas Stakes (Fillies' Group 1)   2018-05-06   Wild Illusion   James Doyle         372         46   12.365591
15 Newmarket Qipco 1000 Guineas Stakes (Fillies' Group 1)   2018-05-06         Worship  Oisin Murphy         195         16    8.205128

Looking at such a table is interesting, but not very visually appealing. Some people also understand data better when it is presented in chart form, rather than simple numbers in a table. Therefore, we’ll now briefly look at plotting jockey’s Group strike rates as a bar chart.

To do this, another popular R library called ggplot will be used. The code below creates a simple bar chart:

# Load the ggplot library
library(ggplot2)

# Create a new variable with the jockey name and strike rate data as x and y axis
sr_bar <- ggplot(guineas_only_with_sr, aes(jockey_name, strike_rate))

# Using the sr_bar variable, create a column chart, with blue bars and the x axis labels at a 90% angle from horizontal
sr_bar + geom_col(colour = "blue", fill = "blue") + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

This chart makes it obvious that Moore and Dettori have the two superior strike rates when riding in Group races.

Ggplot is a very powerful charting library, with huge number of options. An Rstudio cheatsheet helps to explain some of the most common options.

Further information regarding using the dplyr library, which has a wide range of very useful data manipulation functions, can be found in this RStudio cheatsheet.

This tutorial could be easily extended to produce trainer or even sire strike rates in Group races, which could provide further useful insights into today’s 1000 Guineas.

Questions and queries about this article should be posted as a comment below or to the Betwise Q&A board.

The full R code used in this article is found below.

# Load the RMySQL and dplyr library packages
library("RMySQL")
library("dplyr")

# Execute an SQL command to return some historic data
# Connect to the Smartform database. Substitute the placeholder credentials for your own. 
# The IP address can be substituted for a remote location if appropriate.
con <- dbConnect(MySQL(), 
                        host='127.0.0.1', 
                        user='yourusername', 
                        password='yourpassword', 
                        dbname='smartform')

# This SQL query selects the required columns from the historic_races and historic_runners tables, joining by unique race_id and filtering for results only since January 1st, 2006. The SQL query is saved in a variable called sql1.
sql1 <- paste("SELECT historic_races.course,
              historic_races.conditions,
              historic_races.group_race,
              historic_races.race_type_id,
              historic_races.race_type,
              historic_runners.name,
              historic_runners.jockey_name,
              historic_runners.finish_position 
              FROM smartform.historic_runners
              JOIN smartform.historic_races USING (race_id)
              WHERE historic_races.meeting_date >= '2006-01-01'", sep="")

# Now execute the SQL query, using the previously established connection. Results are stored in a variable called smartform_results.
smartform_results <- dbGetQuery(con, sql1)

# Close the database connection, which is good practice
dbDisconnect(con)

# Race type IDs 12 and 15 correspond to Flat and All Weather races
flat_races_only <- dplyr::filter(smartform_results,
                                  race_type_id == 12 |
                                   race_type_id == 15)

# Filter using the group_race field for Group 1, 2 and 3 races only
group_races_only <- dplyr::filter(flat_races_only,
                                group_race == 1 |
                                  group_race == 2 |
                                  group_race == 3 )

# For each jockey name, count the number of rides
jockey_group_rides <- group_races_only %>% count(jockey_name)

# Rename the second column in jockey_group_rides to something more logical
names(jockey_group_rides)[2]<-"group_rides"

# Now filter for only winning rides
group_winners_only <- dplyr::filter(group_races_only,
                              finish_position == 1)

# For each jockey, count the number of winning rides
jockey_group_wins <- group_winners_only %>% count(jockey_name)

# Rename the second column in jockey_group_wins to something more logical
names(jockey_group_wins)[2]<-"group_wins"

# Join the two dataframes, jockey_group_rides and jockey_group_wins together, using jockey_name as a key
jockey_group_data <- dplyr::full_join(jockey_group_rides, jockey_group_wins, by = "jockey_name")

# Rename all the NA fields in the new dataframe to zero
# If a jockey has not ridden a group winner, the group_wins field will be NA
# If this is not changed to zero, later calculations will fail
jockey_group_data[is.na(jockey_group_data)] <- 0

# Now calculate the Group race strike rate for all jockeys
jockey_group_data$strike_rate <- (jockey_group_data$group_wins / jockey_group_data$group_rides) * 100

# Execute an SQL command to return some historic data
# Connect to the Smartform database. Substitute the placeholder credentials for your own. 
# The IP address can be substituted for a remote location if appropriate.
con <- dbConnect(MySQL(), 
                        host='127.0.0.1', 
                        user='yourusername', 
                        password='yourpassword', 
                        dbname='smartform')

# This SQL query selects the required columns from the daily_races and daily_runners tables, joining by unique race_id and filtering for results only with today's date. The SQL query is saved in a variable called sql2.                        
sql2 <- paste("SELECT daily_races.course,
              daily_races.race_title,
              daily_races.meeting_date,
              daily_runners.name,
              daily_runners.jockey_name
              FROM smartform.daily_races
              JOIN smartform.daily_runners USING (race_id)
              WHERE daily_races.meeting_date >='2018-05-06'", sep="")

# Now execute the SQL query, using the previously established connection. Results are stored in a variable called smartform_daily_results.
smartform_daily_results <- dbGetQuery(con, sql2)

# Close the database connection, which is good practice
dbDisconnect(con)

# Filter for the 1000 Guineas only
guineas_only <- dplyr::filter(smartform_daily_results,
                              grepl("1000 Guineas", race_title))

# Using dplyr, join the guineas_only and jockey_group_data dataframes
guineas_only_with_sr <- dplyr::inner_join(guineas_only, jockey_group_data, by = "jockey_name")

# Load the ggplot library
library(ggplot2)

# Create a new variable with the jockey name and strike rate data as x and y axis
sr_bar <- ggplot(guineas_only_with_sr, aes(jockey_name, strike_rate))

# Using the sr_bar variable, create a column chart, with blue bars and the x axis labels at a 90% angle from horizontal
sr_bar + geom_col(colour = "blue", fill = "blue") + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

No Comments (add your own) »

Daily trainer strike rates in Smartform

By Nick Franks on Saturday, May 5th, 2018

There are many ways to skin a cat in Smartform. One of the simplest is to stick with SQL and stay in the database, without resorting to programming or exporting the data.

So here is a breakdown of how to generate one of the most commonly used “derived” statistics in horseracing – trainer strike rates. We hear that trainers whose horses are currently in good form are worth backing – but how can we tell?

The following single SQL produces Trainer strike rates for all daily Flat races in the database, calculated over a period of 14 days. Copy and paste the code at your MySQL command prompt:

Select a.*,

ROUND(Winners/Runners * 100,0) AS WinPct,
ROUND(Placers/Runners * 100,0) AS PlacePct

from(

SELECT hru.trainer_name as Trainer, hru.trainer_id,

COUNT(*) AS Runners,

SUM(CASE WHEN hru.finish_position = 1 THEN 1 ELSE 0 END) AS Winners,

sum(case when hra.num_runners < 8 then case when hru.finish_position in ( 2)  then 1 else 0 end

else

case when hra.num_runners < 16 then case when hru.finish_position in ( 2,3)  then 1 else 0 end

else

case when hra.handicap = 1 then case when hru.finish_position in (2,3,4) then 1 else 0 end

else

case when hru.finish_position in ( 1,2,3)  then 1 else 0 end

end

end

end )as Placers,

ROUND(((SUM(CASE WHEN hru.finish_position = 1 THEN (hru.starting_price_decimal -1) ELSE -1 END))),2) AS WinProfit

FROM historic_runners hru

JOIN historic_races hra USING (race_id)

WHERE hra.meeting_date >= ADDDATE(CURDATE(), INTERVAL -14 DAY)

and hra.race_type_id  in( 15, 12)
and hru.in_race_comment <> 'Withdrawn'
and hru.starting_price_decimal IS NOT NULL

GROUP BY trainer_name, trainer_id

) a

Where runners >= 5 or winners >= 3

ORDER BY  WinProfit desc
LIMIT 20;

The approach is to generate a query within a query – to allow the use of the counts done in the main query to be used to create the strike rate percentages.

The main part joins the historic_runners and historic_races tables by race_id.

The WHERE clause defines the criteria

Firstly, define the period over which you want the analysis, here we set it to analyze races in the last 14 days

hra.meeting_date >= ADDDATE(CURDATE(), INTERVAL -14 DAY)

Secondly we define the race types for all Flat races (12 for Turf and 15 for All Weather)

All of these can be found running the SQL statement:

select distinct race_type, race_type_id from historic_races; )

The final criteria helps to eliminate horses who have not actually participated that are in the database, non-runners, withdrawn etc. There are a number of ways to do this and one can find their own criteria, but I use this:

and hru.in_race_comment <> 'Withdrawn'
and hru.starting_price_decimal IS NOT NULL

The outer section of the SQL selects all the data from the inner part and displays it and calculates the strike rates. The )a afterwards is required by SQL.

Finally in Green are the sorting instructions and some other criteria.

Also note that here I am choosing to only display trainers who have had at least 5 runners or at least 3 winners.

Where runners >= 5

or winners >= 3

I am ordering them in order of greatest win profit

ORDER BY  WinProfit desc

And limiting the number of trainers displayed to 20

LIMIT 20

So on the day of the 2000 Guineas, Brian Meehan is currently the most profitable trainer to follow, with £47 retuns, but P W Hiatt has the highest strike rate, with 60% – albeit with only 5 runners in the last few days.

If you’re looking for a trainer with a high strike rate from at leasy double digit runners in the last 14 days, then Mark Usher, with a 33% win strike and 25% place strike, is impressive.

Adapt the SQL above to generate daily trainer strike rates so you can judge for yourself.

No Comments (add your own) »

Calculating Strike Rate using Smartform and R

By Phill Clarke on Saturday, April 28th, 2018

The simple goal of this article is to show how to calculate jockey Frankie Dettori’s win strike rate at Ascot since 2009, using the Smartform database and the R statistical programming language.

The article begins with the assumption that the Smartform MySQL database has been successfully installed following the instructions provided in the user manual. Some familiarity with basic programming concepts is assumed and the reader will most likely be required to conduct some of their own research to understand some aspects of the R language.

The R language may be installed directly from the project’s website as a binary for Linux, Mac and Windows platforms. Many Linux distributions also include the R language in their repositories for installation via a package manager.

Once R is successfully installed, it is then recommended to install the RStudio integrated development environment (IDE). This will assist to make writing R code easier and is more user friendly than working with just the R command prompt. RStudio provides many useful tools and frameworks for R. The IDE is available as a free download from their website.

Once R and RStudio IDE is installed, it’s time to start writing code.

The first step is to retrieve data from the installed Smartform MySQL database. An additional R library is required for this. The easiest SQL connection library for our purposes is RMySQL. The following code demonstrates the steps required to install this library. Simply cut and paste this code snippet into the Console window in RStudio.

install.packages("RMySQL")

This step may also be achieved by selecting Tools / Install Packages from the RStudio menu options. If all went well RMySQL and its dependencies should have been successfully installed.

Next step is to connect to the Smartform database. This may have been installed locally or on a remote server.

# Load the RMySQL library package
library("RMySQL")

# Connect to the Smartform database. Substitute the placeholder credentials for your own. 
# The IP address can be substituted for a remote location if appropriate.
con <- dbConnect(MySQL(), 
                        host='127.0.0.1', 
                        user='yourusername', 
                        password='yourpassword', 
                        dbname='smartform')

There are a number of different ways to use the RMySQL package, in terms of the dbConnect arguement. The manual for this package, which can be found at Github or as a PDF from the CRAN package homepage, describes creating configuration files and additional options.

Once connection to the Smartform database is established, the next step is to retrieve some data. The SQL query language has many options and is very powerful. However, this article will endeavour to keep things simple.

# This SQL query selects the required columns from the historic_races and historic_runners tables, joining by unique race_id and filtering for results only since January 1st, 2009. The SQL query is saved in a variable called sql1.
sql1 <- paste("SELECT historic_races.course, 
              historic_runners.jockey_name, 
              historic_runners.finish_position 
              FROM smartform.historic_runners
              JOIN smartform.historic_races USING (race_id)
              WHERE historic_races.meeting_date >= '2009-01-01'", sep="")

# Now execute the SQL query, using the previously established connection. Results are stored in a variable called smartform_results.
smartform_results <- dbGetQuery(con, sql1)

# Close the database connection, which is good practice
dbDisconnect(con)

If all went well, there will now be a new R dataframe containing in excess of 118,000 results, which is every ride for every jockey at every course, since January 1st 2009. This data can now be filtered to retain just Dettori’s rides at Ascot. There are often multiple ways to achieve the same result when using R. We’ll load another R library called dplyr (which will need to be installed first) in order to again keep things simple where possible.

# Load the dplyr library
library("dplyr")

# Filter the long list of all courses for Ascot only
ascot_only <- dplyr::filter(smartform_results,
                               course %in% c("Ascot"))

# Filter all Ascot runs for Dettori's rides only
dettori_rides <- dplyr::filter(ascot_only,
                            jockey_name %in% c("L Dettori"))

A few assumptions have been made with the above code. Firstly, we know exactly how Dettori’s name appears in the database, so we can match on the exact string. We also haven’t differentiated between Flat and National Hunt racing at Ascot, because Frankie Dettori only rides on the Flat.

The data contained in the dataframe dettori_rides now contains every Dettori ride at Ascot since 2009. Next step is to calculate strike rate. This is a very simple calculation where total wins are divided by total rides and multiplied by one hundred, to obtain a percentage.

# Count total rides for Dettori at Ascot
total_rides <- nrow(dettori_rides)

# Filter for only winning rides. i.e. finish position is 1
dettori_wins <- dplyr::filter(dettori_rides,
                              finish_position == 1)

# Count total winning rides
total_wins <- nrow(dettori_wins)

# Calculate strike rate
dettori_strike_rate <- (total_wins / total_rides) * 100

# Display strike rate
dettori_strike_rate
# 14.90683

At the time of writing this article, Frankie Dettori’s total win strike rate at Ascot, since 2009, is 14.91%

This concludes the current article. In future posts, additional statistics will be calculated and a deeper understanding of the R language obtained.

Questions and queries about this article should be posted as a comment below or on the Betwise Q&A board.

The full R code used in this article is found below.

# Load the required library packages
library("RMySQL")
library("dplyr")


# Connect to the Smartform database. Substitute the placeholder credentials for your own. 
# The IP address can be substituted for a remote location if appropriate.
con <- dbConnect(MySQL(), 
                 host='127.0.0.1', 
                 user='yourusername', 
                 password='yourpassword', 
                 dbname='smartform')

# This SQL query selects the required columns from the historic_races and historic_runners tables, joining by unique race_id and filtering for results only since January 1st, 2009. The SQL query is saved in a variable called sql1.
sql1 <- paste("SELECT historic_races.course, 
              historic_runners.jockey_name, 
              historic_runners.finish_position 
              FROM smartform.historic_runners
              JOIN smartform.historic_races USING (race_id)
              WHERE historic_races.meeting_date >= '2009-01-01'", sep="")

# Now execute the SQL query, using the previously established connection. Results are stored in a variable called smartform_results.
smartform_results <- dbGetQuery(con, sql1)

# Close the database connection, which is good practice
dbDisconnect(con)

# Filter the long list of all courses for Ascot only
ascot_only <- dplyr::filter(smartform_results,
                            course %in% c("Ascot"))

# Filter all Ascot runs for Dettori's rides only
dettori_rides <- dplyr::filter(ascot_only,
                               jockey_name %in% c("L Dettori"))

# Count total rides for Dettori at Ascot
total_rides <- nrow(dettori_rides)

# Filter for only winning rides. i.e. finish position is 1
dettori_wins <- dplyr::filter(dettori_rides,
                              finish_position == 1)

# Count total winning rides
total_wins <- nrow(dettori_wins)

# Calculate strike rate
dettori_strike_rate <- (total_wins / total_rides) * 100

# Display strike rate
dettori_strike_rate
# 14.90683

No Comments (add your own) »

Cheltenham – Final Day

By colin on Thursday, March 13th, 2014

Thank goodness it’s the last day.  I am a Cheltenham traditionalist, so technically this last day should not even exist, the Gold Cup should already have been run, and I should not have to blog having arrived home late on a bumpy taxi ride.  Never again – roll on the month long breaks between posts of the past.  But once you’ve started a Cheltenham selections blog, it’s just not on to stop – even if nobody is reading!

Anyway, I digress.  Cheltenham selections for the final day, as powered by the wonder of Smartform, are:

13:30: Guitar Pete
14:05: Diakali
14:40: Briar Hill
15:20: Bobs Worth
16:00: Harbour Court
16:40: Une Artiste
17:15: Oiseau de Nuit

All we need at this stage is just one decent priced winner to put the system in LSP, and not a bad strike rate to boot, for the whole of the Festival.

Good luck!
 

 

No Comments (add your own) »

Cheltenham Day 3

By colin on Wednesday, March 12th, 2014

For our 1 reader, you must be getting fed up with this by now 🙂

Though the overall results are still not too shabby, despite only 1 winner today.  Tomorrow, the automated Cheltenham system has thrown up a fair few at decent prices.  Here goes, and fingers crossed:

13:30:  Double Ross
14:05:  Fingal Bay
14:40:  Boston Bob
15:20:  Quevega
16:00:  Sraid Padraig
16:40:  Same Difference

And that’s your lot…

 

No Comments (add your own) »

Cheltenham Day 2

By colin on Tuesday, March 11th, 2014

So today we picked 2 winners, 3 places (inc. 20-1 second) and 2 unplaced.

And coming up the automated picks for tomorrow’s racing (but first a word from our sponsor):

I wrote this program for Cheltenham horses about 4 years ago.  It takes about 2 minutes to run over all the form in Smartform and most years returns a level stakes profit. Over all years, its profitable to LSP. Oh, the wonders of Smartform. Or more accurately knowing a bit about horseracing and having at your disposal a programmable database.

So here goes for tomorrow’s picks (actually takes longer to write the blog post than pick the horses, can you believe it?):

13:30: Royal Boy
14:05: Carlingford Lough
14:40: Dell’ Arca
15:20: Sire De Grugy
16:00: Quantitativeeasing
16:40: Orgilgo Bay
17:15: Modus

Best of luck to you all, and, if you want me to run the program again tomorrow, post a comment or retweet to let us know you’re listening 🙂

And best of luck to you all…

No Comments (add your own) »

Happy Cheltenham!

By colin on Monday, March 10th, 2014

Well, what better way to awake to Spring than the Cheltenham festival?

I’ve followed a Cheltenham system based on data only programmatically accessible through Smartform for the past few years that has always yielded an LSP. It tends to spot a couple of longshots through the meeting, so it’s not to be relied on. But for fun, here goes selections for Tuesday:

13:30 Vautour
14:05 Trifolium
14:40 Ma Filleule
15:20 Hurricane Fly
16:00 Quevega
16:40 Shotgun Paddy
17:15 Art of Logistics

Have fun!

No Comments (add your own) »

Twitter updates for 2014

By colin on Sunday, January 5th, 2014

Hi there,

And Happy New Year to all of you!

There’ll be a few improvements to Betwise in the year ahead, and we’ll be starting with Twitter service updates for Smartform subscribers.

We’ve been trialling these for a while, now we’ll be adding in the date.  Basically for all the daily subscribers the tweets will show you when the historic results for the previous day’s racing have been updated and with what meetings, and likewise for advance daily cards, when these have been updated and for what meetings.

Hope you like this, and please don’t forget to follow us at www.twitter.com/betwiser for more in the year ahead.

No Comments (add your own) »

Pace in the Coral Sprint Trophy, Newmarket

By colin on Saturday, May 18th, 2013

Following on from last week’s post about visualising pace in the race, here’s another pace chart painstakingly crafted for today’s big sprint at Newmarket.

As with Ascot last week, you can see that at Newmarket the horses run from right to left if we are looking from the stands side (the usual television angle) of the course.  Thus Hasopop is in stall 16 and BlueGrass Blues, running towards the far (or “inside”) rail, is in stall 1.

Non-runners Lewisham and Heaven’s Guest (at the time of posting) have been removed.

So the pace in the race is definitely with Bapak Sayang towards the stands rail.  This colt has a great draw for a front runner in stall 15 in that he may be able to bag the rail early to run against and dictate the pace of the race.  The bad news is that Newmarket’s wide, straight galloping course has not got the best strike rate for front runners with ambitions to win.  There also has to be a caveat that this horse has only had 5 runs so it might be a little early in his career to label him a persistent front runner.

If you like the graph, or anything needs clarifying, please let us know in the comments section.

No Comments (add your own) »