I've been making some changes to my football model over the international break and its prediction performance has got a little better again. Although with at least one side-effect that we'll see in a second.
Apart from some robustness testing, the biggest change has been adjusting shot success depending on the opponent. The model should handle number of shots and which players take those shots, reasonably well. It simulates the passing of individual players and if you haven't got the ball, you can't shoot, so in the simulation, shot rates will naturally increase against worse teams and drop against better ones, without me having to impose that. When you look at simulated shot stats for a whole season, it does do this job fairly well.
Where the previous version of the model had a problem is that all shots aren't equal. Just for today, let's call this 'The Stoke Effect'.
I've had issues with Stoke in the model since the start. Their pass success rate isn't great and shot conversion rate isn't either, so in the simulation they were predicted to lose a lot more than they did in reality. There is something that Stoke do well though - they force the opposition into taking unsuccessful shots.
I can't impose shot rates on the model, because a big part of shot rates is already simulated through possession. What I can impose is an adjustment factor on the chances of a shot being a goal.
Here are the adjustment factors:
At the moment, I'm not analysing tactics to find out why those numbers look the way they do, I'm just playing with outcomes, but if you're not fussy about pretty football then Stoke are definitely doing something right. I'm also starting to think, "Why don't Stoke get relegated?" might make a fun article for EPL Index, who provide all the stats for this model. Partly as a result of building this model, I'm becoming much more interested in how middling teams get results against the top four (six? Seven so that Liverpool are included? I think those are the rules), than in Man City's most effective attacking combination.
In terms of prediction, Man City should win most of the time and for the purposes of this model, I'm not all that fussed about by how many goals they do it, as long as the result's correct. What's really interesting is whether we can forecast when they won't win. Adjusting shot success helps a lot in doing this, because it points towards those days when a team has plenty of the ball, but just can't score.
Still here? You could definitely be forgiven for skipping ahead to this bit. Here are this weekend's predictions:
You can see the side effect I was talking about right at the start of this post, in the Newcastle win chance. Newcastle would have been predicted to lose anyway, but they also have below average stats for how many shots their opponents convert, so they get penalised very heavily now.
I have no doubt this percentage is a bit low, but it's happening because I use average performance to predict the result of single games. What the model's saying, I think, is that if Newcastle don't change their tactics against Man City from what they normally do (maybe 'park the bus'), they're going to get hammered.
The Swansea prediction also sticks out this week. Bit peculiar, but it would be no fun if we agreed with the bookies every time. Let's see what happens.
Lots more work still to do, but it's good to be making progress... Two big jobs on the list next:
1. As this blog points out, I really need to run the model over more seasons and see how it gets on.
It's a bit tricky because I use a stat on '% of passes in opponent's half' to model attacking pressure and that stat's only been available on the EPL Index site since this season. Overall it doesn't add huge amounts to the predictions though, so I'm probably just going to turn that feature off and then run over the past 3-4 seasons.
2. The model knows nothing about form at the moment - players perform better or worse depending on their opponents, but their base stats are the same in every game. I'm eyeing this as potentially the next big improvement.
Small bets placed and I'm off to enjoy the British countryside. With live score updates on my phone, obviously!
Apart from some robustness testing, the biggest change has been adjusting shot success depending on the opponent. The model should handle number of shots and which players take those shots, reasonably well. It simulates the passing of individual players and if you haven't got the ball, you can't shoot, so in the simulation, shot rates will naturally increase against worse teams and drop against better ones, without me having to impose that. When you look at simulated shot stats for a whole season, it does do this job fairly well.
Where the previous version of the model had a problem is that all shots aren't equal. Just for today, let's call this 'The Stoke Effect'.
I've had issues with Stoke in the model since the start. Their pass success rate isn't great and shot conversion rate isn't either, so in the simulation they were predicted to lose a lot more than they did in reality. There is something that Stoke do well though - they force the opposition into taking unsuccessful shots.
I can't impose shot rates on the model, because a big part of shot rates is already simulated through possession. What I can impose is an adjustment factor on the chances of a shot being a goal.
Here are the adjustment factors:
At the moment, I'm not analysing tactics to find out why those numbers look the way they do, I'm just playing with outcomes, but if you're not fussy about pretty football then Stoke are definitely doing something right. I'm also starting to think, "Why don't Stoke get relegated?" might make a fun article for EPL Index, who provide all the stats for this model. Partly as a result of building this model, I'm becoming much more interested in how middling teams get results against the top four (six? Seven so that Liverpool are included? I think those are the rules), than in Man City's most effective attacking combination.
In terms of prediction, Man City should win most of the time and for the purposes of this model, I'm not all that fussed about by how many goals they do it, as long as the result's correct. What's really interesting is whether we can forecast when they won't win. Adjusting shot success helps a lot in doing this, because it points towards those days when a team has plenty of the ball, but just can't score.
Still here? You could definitely be forgiven for skipping ahead to this bit. Here are this weekend's predictions:
You can see the side effect I was talking about right at the start of this post, in the Newcastle win chance. Newcastle would have been predicted to lose anyway, but they also have below average stats for how many shots their opponents convert, so they get penalised very heavily now.
I have no doubt this percentage is a bit low, but it's happening because I use average performance to predict the result of single games. What the model's saying, I think, is that if Newcastle don't change their tactics against Man City from what they normally do (maybe 'park the bus'), they're going to get hammered.
The Swansea prediction also sticks out this week. Bit peculiar, but it would be no fun if we agreed with the bookies every time. Let's see what happens.
Lots more work still to do, but it's good to be making progress... Two big jobs on the list next:
1. As this blog points out, I really need to run the model over more seasons and see how it gets on.
It's a bit tricky because I use a stat on '% of passes in opponent's half' to model attacking pressure and that stat's only been available on the EPL Index site since this season. Overall it doesn't add huge amounts to the predictions though, so I'm probably just going to turn that feature off and then run over the past 3-4 seasons.
2. The model knows nothing about form at the moment - players perform better or worse depending on their opponents, but their base stats are the same in every game. I'm eyeing this as potentially the next big improvement.
Small bets placed and I'm off to enjoy the British countryside. With live score updates on my phone, obviously!