Wallpapering Fog: agent-based modelling

Showing posts with label agent-based modelling. Show all posts

Monday, 8 April 2013

Football model: Under the hood

I was writing a proposal for a client last week and remembering how important it can be to show some of the mechanics underneath your answers. Not to explain everything that's going on, but to share some screenshots and explanations as evidence that you aren't just making all this stuff up as you go along. After all, you could drive your car quite happily without ever seeing what makes it go, but it's definitely reassuring to see a big shiny engine when you pop the bonnet open.

All most people have seen of the football model so far, is some percentages that get spat out at the far end. My first post about football simulation explained the basics of the model, but what am I actually up to, that makes these numbers happen?

Who else gets excited by screenshots of spreadsheets? Just me then? Ah well, here they are anyway.

Step one, is a list of the weekend's games and predicted starting line-ups. I get these from Fantasy Football Scout and week to week, input team changes by hand. I'd really like to automate this bit because it's a pain - it doesn't take ages to set up, but being manual means if I'm not at a computer, the model can't run itself.

This list of fixtures and players gets read in by the simulator (Visual Basic for the moment, if you're wondering) so that it can simulate virtual games.

Next, we need stats for each player, so that the simulator knows how each of them performs in real life. These stats come from EPL Index and give us a database describing each player's decision making, successes and failures in real games so far this season.

For each team, that looks something like this one for Southampton.

Yeah, I've missed out the column headers. Sorry about that, but this is turning into something I've invested a lot of time in! You can probably work some of them out if you're determined to...

These stats get pulled into the simulator and then it's ready to run a virtual game.

Or actually, to run 1000 virtual games and tell us what the result was in each of them, so we can find the percentage chance of either team winning, or of a draw.

Who wants to see what a footballer looks like inside The Matrix?

Probably prettier in real life. He's got a good engine though.

And now we're ready to run the weekend games. I press go and this happens.

If it's just running one weekend's games then I'll read Twitter for a bit. If it's simulating a whole season then the laptop gets some alone time and we come back later. It's playing through each fixture 1000 times, with around 800-1000 events per game.

At the top end, that's a million events to get one simulated result. 380 games in a season and so when I do a large run to assess whether changes to the model have improved its performance, we're simulating 380m individual events. Definitely gives me time to fit in a cup of tea.

And finally, out come the percentages that I've been posting for the past couple of months.

So now you know, it really is a proper model. One that I've spent far too long building.

And it works...

Monday, 25 February 2013

Curse Blogger post scheduler. Predictions for erm... last Saturday

This set of predictions was meant to go live on Saturday morning, but a glitch in Blogger's post scheduler meant it didn't happen, sorry about that. I had a very kind trail from @OptaPro on Twitter this week too, so it was an even bigger disappointment. On the plus side, I couldn't post manually because I'd gone paragliding and I love paragliding, it's even better than statistics and football matches.

Back to football and the model's new and improved so I thought it would be worth reposting these predictions and explaining what I've been up to.

If you'd like a bit of history, try my past posts. I'm using an agent-based model of football matches to try to predict results and as usual, predicted starting line-ups for the teams are from Fantasy Football Scout. At some point, I'll build an engine to scrape the actual announced line-ups half an hour before kick off and re-run the model automatically, but one step at a time...

The big improvement I've been working on, which has turned out to make a small overall improvement in prediction accuracy, is to allow players to have a good or bad game. Previously, each player always performed at their average level - so for example if their passing accuracy averages 80% they'll always pass at 80% accuracy - but now I use the standard deviation of each player's passing accuracy and sample from a normal distribution, to decide how a player will perform. What this means is (to pick a couple of random examples) a very consistent player like Paul Scholes will always pass well in the model. A player like Darren Bent will have passing accuracy that's all over the place, with some very good games and some very bad ones.

This "form" feature is random for the minute, although I have spotted some interesting relationships in the data and I think to an extent it's predictable when players will have a bad game. Check out this tweet for an example. I've promised EPL Index (where all the data comes from) an article on this though so it will have to wait for a minute.

Onto the predictions! They were predictions, honest.

And how did we do this week?

I actually had a small bet on these and am up for the weekend already, with the Spurs game still to play, so it didn't go too badly. From here, I picked:

Fulham to win (won)
Newcastle to win (won)
Wigan to win (won)
Sunderland to win (lost)
Norwich and Everton to draw (lost)
Spurs to win (playing tonight)

Of the remaining games, I don't trust Arsenal in the model at the moment. They pass well and it's largely a passing-based sim, so it seems to overestimate their chances, although it called this result correctly (just about). Who trusts Arsenal to reliably get a result anyway? The model called Man City and Man United's results correctly, but the odds were rubbish so I left those two.

If Norwich hadn't been allowed to take that last corner, I'd have had an even better weekend! This model's not doing so badly, if I do say so myself. Definitely worth persevering with.

I promise faithfully, on my honour, to have predictions up before the next set of matches this weekend.

Saturday, 2 February 2013

Football Sim: Predictions for 2/3 Feb 13

This is probably going to be the last set of predictions before I put some proper time into improving the model. We know that on current performance, it's going to slowly lose money if you bet on it and that's not tremendously exciting. Improvements from here are much harder than building the simulator in the first place, but I've got a few promising ideas to follow up.

Populating the fixtures with expected starting line-ups is also a complete pain in the neck and takes far too long. I'm going to have to sort that out, because sometimes my Friday evenings are based around beer rather than football match modelling.

Having said that, putting this set of forecasts together has thrown up a few interesting effects and led to me tweaking the algorithm a little already.

Here's what we've got. Starting line-ups from Fantasy Football Scout.

A few of those percentages stick out as disagreeing with the bookies odd this morning. Taking those ones in order...

Everton vs. Aston Villa

Everton are predicted to win, sure, but the bookies give Villa almost no chance and my model thinks they could win it. Why does it think that?

The big reason (that we'll see again for the Man City game) is that the model doesn't really understand defending yet. It will penalise teams that have only average ball retention but which are above average at defending. Conversely for Villa, it doesn't know that their back line has shipped 46 goals so far this season. The model also currently sees a player like Fellaini as a striker with decent shooting accuracy and below average passing - it doesn't understand the physicality of his game.

It's far from perfect! I did say I was doing my development work in public. Anyway, on to...

Manchester City vs. Liverpool

I'm sure this is the defending factor again. Could happen though and maybe this prediction will make some Liverpool fans happy.

Reading vs. Sunderland

I like this one, it's interesting! I've got Reading at 10% (decimal odds 10.0). The bookies odds say they're going to win the game. What's that all about then?

Well first of all, the model's using player stats over the season so far, not just the past few games. Up until Christmas, Reading really weren't good, which drags their performance down.

The big question in this game though is what's going to happen with Adam Le Fondre? The sim doesn't do substitutes yet and he's not in Fantasy Football Scout's predicted starting line-up. We can't do super subs.

Without Le Fondre starting in the sim, Reading will struggle badly to score.

We've played the game 1000 times without him. Let's stick Le Fondre in for Guthrie, play it another 1000 times and see what happens. We'll be giving Le Fondre his super-sub stats over the whole game.

That's quite a difference! Sunderland still win it, mind.

Now let's hope the favourites don't let us down this time and we can do a little better than last Tuesday evening.

Tuesday, 15 February 2011

Probably the best strategy in the world

Neil Perkin over at Only Dead Fish has written a nice piece on new measurement techniques and predictive markets. It's an area of marketing measurement that I find fascinating, even if so far I've seen very few real world marketing applications.

Prediction markets are games where you trade shares in future events. The Hollywood Stock Exchange is a famous example, where you 'bet' on the audience that films will achieve at the box office. The idea is that people (on average, in large numbers) are quite good at guessing what other people will do and the outcome of future events. Running a survey and asking people if they plan to see an upcoming film at the cinema is - runs the theory - less accurate than asking those same people whether they think lots of other people will watch it.

In one respect, it's easy to see that the theory works. In horseracing, horses become favourites because people bet that they're going to win and very often the favourite does win. Odds on betfair are effectively the punters' averaged view of what they think is going to happen in future. Websites like Political Betting take those market odds and use them as a prediction tool for election outcomes or how long the current Prime Minister will last.

If you fancy reading a bit more and playing with some toys, then Inkling is a good place to start.

The advertising applications of prediction markets are exciting. Instead of a focus group asking people if they like a new product, you could ask a sample of respondents if they think other people will buy it. Want to know which mobile phone platform will dominate in five years? Get people to bet on it. Don't ask people if they like your creative, ask whether they think it will be popular.

In terms of their output, prediction markets have some similarities to another research technique that I'm excited about; agent-based modelling. It's a bottom-up approach to modelling where you create an artificial simulated market that contains individuals, give them some rules and then see how they behave. You might set up a simulation for a new product launch and then model how shoppers trial and adopt the product as they are exposed to advertising messages. The crucial difference to top-down modelling where you analyse past sales is that the simulated individuals in an agent-based model have an element of randomness in their decision making - they don't necessarily do the same thing every time you run the simulation.

These two new techniques are similar in that their output tries to account for randomness. You don't get a single answer and in that, they're much more like reality than a lot of the techniques we use right now. What you get are predicted likelihoods that rank possibilities of things that might happen.

Think about what that means for a minute. An analyst can predict the best strategy for launching a brand and that 70% of the time in simulations, sales exceeded the target. It's the best strategy, but even in the simulation it often doesn't work. We can tune the strategy to improve our chances, but in the end, randomness in the model means we might fail even though our strategy was a good one.

Weather forecasters often give us predictions this way - they'll say that there's only a 20-30% chance of rain, so you get annoyed when you turn up for your meeting without an umbrella and soaking wet. The forecaster didn't say it wouldn't rain though, so it's your fault really - he said it probably wouldn't rain and you chose to risk it.

I'm incredibly excited about these emerging techniques, but they need some new thinking on the analyst's and on the decision maker's side. We analysts need to work out how to apply new predictive techniques to marketing. Marketers need to recognise that they're going to get some extra information on which to base a decision and not the perfect answer.

That's actually the way that analytics should always have worked, but both sides too often like to pretend otherwise.

If you ask for randomness to be included in marketing analysis, then you're going to get answers that far more often include the word 'probably'.

Pages