Monday 20 January 2014

Betting using my model of Premier League football

I've been getting some questions over the past few weeks about the betting calls I make using my EPL model, so this post will explain how the betting choices work. If you just like to see who the model thinks will win each week then maybe skip this one, but if you're one of the people who's been looking at the calls and thinking, "What? He can't do that!" then this should help to explain my methodology. This is also going to be more than usually geeky so Wallpapering Fog felt like a better home for it than the EPL Index site.

If you're thinking "what EPL model?", have a look on eplindex.com.

First, a bit of history on where the model came from. That journey is how we got to here...

I've said before that this model wasn't originally built as a tool for betting and it's true. I first found eplindex.com last season (back when you could access all of their Opta stats for a few pounds a month), subscribed to the Stats Centre and built the model mainly to see if it would work. I had a vague thought that if it did work, then it could be interesting for a football club to use to forecast match results based on picking different players, but also assumed that the bigger clubs would already have sophisticated models of their own to do this type of work.

The model churned out a set of results for the first half of the 2012/13 season and I needed something to compare them with. Was my model any good? Bookmakers' odds are an obvious place to look for alternative results predictions, with easily accessed historical data available (football-data.co.uk if you're looking.)

That first version of the model didn't quite equal the bookmakers, in terms of the results that it said were most likely to happen, actually happening. The bookies favourites won games slightly more often than the model's predicted most likely outcomes.

Despite this, the model was projected by that analysis to make a small return if you used it to bet. The model didn't say the bookies favourites would win all of the time, so picked up some wins at decent odds. Bookmakers also almost never say that a draw is the most likely outcome of a game and if you backed a draw when the model said its likelihood was over 25%, you made a healthy return.

I started to predict results on Wallpapering Fog ahead of the games being played.

For betting, the rules were simple. Back a draw if the draw likelihood was over 25%, otherwise back whoever the model said was most likely to win. That's backing winners with no regard whatsoever to the market odds on that game. You could be backing a long shot that the model likes a lot, or backing a very short odds favourite that the model gives only a 40% chance of winning. For draws, the odds are usually around 3.5 but again, I was paying them no attention when picking the bets.

This method has periodically upset more seasoned gamblers, who point out that you shouldn't make picks like that. I do understand why not and I'll come back to it. Please bear with me.

The method arises as a result of having a primary objective for the model of calling as many results correctly as possible, rather than trying to maximise betting profits. This objective is also why I've never looked at the potential returns from using my model to call correct scores, or accumulators, or both teams to score.

It works like this:

1. Get as many results right as possible.

2. See if the strategy that achieves point 1, also makes money.

It did make a profit last season and is winning this season too, so that 'most likely outcome' method isn't as naive as it might look.

For any readers who aren't seasoned gamblers, the issue with backing the most likely outcome regardless of what odds the bookies are offering, is that you could be backing a result you think is a very close call, when the bookies are offering a only poor return if you're right.

If I flip a coin then you know the chance of it coming up heads is 50%. If I offer you odds of 1.5 on a bet on heads (£5 profit if you bet £10), you'd be mad to take it. You might win once, but in the long term, you're guaranteed to lose.

It's time to share some data... If you run the latest version of my model over the first 200 fixtures of the 2013/14 season, betting £10 on the predicted most likely result of each game, or on a draw if the predicted chances are over 27% (it's gone up a little from a 25% draw line since that first version) then here's what happens.

Important note: The data I'm using here to populate the simulation is the data that we had after week 20 had been played. I also know the exact starting line-ups for each of these games, which I won't when I post on a Friday ahead of a weekend's fixtures.

This is very much a best case performance. The model's good. But it's not quite this good.




So betting on the most likely result, regardless of market odd seems to work. Part of the reason for this is that we're imposing quite a harsh line before an upset is picked as a bet. In its raw results, the model predicts too many upsets, so rather than just saying it has to like the underdog more than the bookies do, we have a rule that it must like the underdog enough to actually return a prediction that they will win the game.

Very probably a better gambling strategy would be avoid to betting on certain fixtures at all, but we come back to my bullet points above; I'm forcing myself to give a prediction for every game. There is also very likely a better gambling strategy to be found in this model, but I like the simplicity of betting on the predicted winner. It works.

If you'd like to come up with your own strategy, I've put a link to all of the data behind the first 200 games of this season at the end of this post.

Let's have a look at what happens with an alternative strategy of backing 'value'. What happens when we bet on whichever of the three results (home win, away win, or draw) gives the biggest difference between the model's simulated likelihoods and the bookies odds? If the model's got an 'edge', then this should work.

The 'value' strategy's cumulative profit is in red below, with my usual method remaining in blue.


So the value strategy is also predicted to work, but returns are more volatile, as you'd expect since you're backing more long-odds results. Using the value strategy, you also win 38% of bets, rather than the 56% you're predicted to win by backing the most likely result. Both strategies should work (provided you don't mix-and-match between them) but the 'most likely result' is less risky in terms of long, bad runs.

To recap, the strategy I'm currently following arises from:

1. A self imposed rule that I must bet on every game and stake the same amount on every bet.

2. There is a benefit of moderating the model, so an upset must be predicted as being very likely, before we back it.

3. Evidence (the above, plus last season and this season so far) that backing the most likely predicted result is effective.

If you'd like to dive into the data, see where these numbers come from and pick your own strategy based on the EPL Model's calls, it's all here.