Wallpapering Fog: September 2013

Thursday 26 September 2013

Off the corporate grid part 1: Cheerio Windows

I wrote a post a while ago about how I planned to try a little geek project of dropping myself off the corporate web. No more free GMail, other Google tools, or Microsoft Windows... and see how much I miss them. Is 'free' worth the price?

This post is part one - dumping Windows for Linux on an old laptop.

Tech skills needed: 4/10
Worth the effort: 8/10
Value for money: 10/10

I'm very much writing here from the point of view of somebody who'd like to give Linux a try, but ease of use is a major priority. I know that for many people, Linux is a rewarding investment of their time and some enjoy battling to make a piece of software work properly. Screw that. Day to day, using a computer should be easy. I've probably got some terminology wrong below and been floundering around with issues that an expert wouldn't even have noticed, but that's part of the point - can an average user get by without Windows?

Before starting this little exercise, I knew what Linux was - an Open Source (free) alternative to Windows - but I'd never used it, beyond a quick hack to get some files off a dead PC (of which more later). Of the few people I'd talked to about Linux, a couple are serious tech experts who are definitely up for some hardcore IT fettling, and one was annoyed that he'd bought a netbook with Linux on it, rather than Windows, and it wouldn't talk to a few of his other gadgets.

I suspect that along with a lot of other people, I had a vague feeling using Linux would involve battling with screens like this:

But don't worry, it doesn't.

As a new user, the biggest problem you face after deciding to maybe give it a go, is in working out what to install.

Google "install Linux" and the top link is to "Red hat enterprise Linux". This looks complicated. It's not what you want.

Then after a bit more reading, you find out Linux comes in distributions and that there are quite a few. Distributions are like different skins; they all use the Linux kernel as the central architecture that makes them work, but they behave in different ways, with alternative looks. You need to choose which distribution to install and now you find yourself on a website like this.

Oh, for Pete's sake.

I strongly suspect at this stage a lot of people say, "oh, sod it, I can't be bothered" and go back to Windows. I've certainly done that once before. This time I'd made a promise on my blog though, so pick a card... any card...

Linux Mint is top of the popularity list. Let's install that.

Wikipedia says Linux Mint

"is a Linux distribution for desktop computers, based on Ubuntu or Debian."

and that Ubuntu

"is an operating system based on the Linux kernel and the Linux distribution Debian, with Unity as its default desktop environment."

And I say if Linux wants more mainstream adoption (which I assume it does) then it needs to stop making things so bloody difficult.

Mint is easy to install and easy to use. You don't need to know what Debian or Unity are. I've only got a vague idea what they are. I might find out at some point but you really don't need to.

Mint has a nice clean homepage, with a prominent download button.

Unfortunately, when you click through to the downloads page, there are too many options for an IT novice. You thought you'd solved the problem of which distribution you want? Well Mint comes in several different flavours too. Somebody's doing this on purpose.

This page badly needs a big, fat, "Don't know what to install? You want this!" button. At least the one you want is at the top of the list: Linux Mint Cinnamon.

You've just done the hardest bit. I'm not kidding, the hardest bit of using Linux is working out what on earth to install in the first place.

Download Cinnamon (32 bit or 64 bit depending on your PC) and then you'll need to make a disk to install it with. It comes as an ISO file, which is the contents of a DVD, ripped onto a file. If you don't already know how to turn that into a DVD, then a quick Google will turn up lots of software, or you can follow this easy guide.

Congratulations, you've got an installation disk! That was definitely harder than buying a Windows installation disk, but it was quite a bit cheaper too.

You can actually use your disk straight away by putting it in the drive and then rebooting your PC. It should boot straight to the Mint desktop, which is bloody handy if your copy of Windows ever takes a dive and you need to get all of your files back. I've rescued videos and music from a laptop like this in the past, when Windows flatly refused to either boot, or reinstall.

You won't want to run Mint from CD all the time though, because it's slow and doesn't remember anything when you turn off your computer,

To install properly, you click on the desktop icon and it's easy from there, but you have a choice to make. Do you want to clear Windows off your laptop completely (copy your files somewhere first!), or do you want to 'dual boot' so that you'll be asked whether to start Windows or Mint when you turn on the PC?

We're just experimenting here and clearing Windows altogether feels a bit rash, so dual boot is probably best.

You'll need somewhere to put Mint, so you have to create a partition on your hard drive, using Windows. This guide explains how.

And now you can install it from the DVD you made.

And we're done. So what's it like?

Well it's like Windows. It looks like Windows and it acts pretty much like Windows too. It's even got a Start menu that pops up when you press the Start button. I'd be willing to bet that if I put it on my grandmother's laptop, she'd barely notice.

(don't worry, you can change the background image, just like Windows)

On an old laptop, you'll find it starts up faster and it doesn't do the special Windows boot thing of looking like it's ready, but keeping you waiting for another five minutes before you can actually open a program.

The web browser runs faster and doesn't hang all the time. (I know you can put Firefox on Windows too but it's an old laptop and it never really ran Windows Vista properly.)

I swear the laptop battery lasts longer.

These are all big ticks. Linux is a much lighter load on the PC, so if you've got a laptop that's getting on a bit and that is only really used for web browsing, you'll find it's a much faster, slicker experience.

Mint comes with Libre Office, which is a more than passable alternative to Microsoft Office and can still open all of your .doc and .xls files. If you don't need VBA macros and heavyweight Excel workbooks, it's great.

The best recommendation I can make for Mint is that since making our old laptop dual boot about a month ago, I've only touched Windows once, because I needed it to talk to a GPS and it has the drivers built in. My wife hasn't used Windows at all. Why would you? For simple tasks, Linux just works better. It's only more obscure gadgets that are an issue too - plugging in a digital camera or a USB thumb drive is fine.

I have found myself on a few forums, learning some more complicated bits and pieces when I wanted to push beyond simple web browsing and admittedly I wasn't able to make Google Picasa work properly, even though it's supposed to. All in all though, if you want basic features, Linux is brilliant. If you want more than basic features you can certainly have them, but you'll need to get your hands dirty.

The only real difficulty I found for Mint was that when it first started up, everything worked perfectly except wifi. This is because the wifi card needs a proprietary driver, which Mint had found, but didn't activate automatically. Easily fixed through a simple menu, but I'd have liked a pop up on the first boot, prompting me that Mint already knew how to make the card work and asking if I wanted it switched on. As it is, it's possible a less curious user would have just assumed Mint didn't work with their laptop.

In terms of dumping the corporate web, this one's a 'not quite', but well worth doing all the same. There's no way I could survive without Windows at work (no Excel, Tableau, SQL Server...? Not going to happen) and at home it would probably wind me up about once a month that something Windows is able to do, was difficult or impossible with Linux.

You can have the best of both worlds though. A slick, fast experience that's not beholden to Microsoft for 90% of the time and a quick boot into Windows when you have to. I'm impressed. This has been a really worthwhile little experiment and I'd thoroughly recommend it.

Wednesday 18 September 2013

Luck in football part 2. Can you have a lucky season?

"It evens itself out over a season and that will never change. You get breaks here and there. Every club gets good breaks, bad breaks."

Sir Alex Ferguson

Does it though? Does luck even itself out across a season? This is a follow up to my post a couple of weeks ago, looking at how much luck there is in a single English Premier League result. The obvious next step is to try to extend that analysis to a thirty eight game season and see whether, over a larger number of games, most of the random chance in football then disappears.

Before we dive into the analysis, it's useful to think about what level of luck might feel right for an average team, in terms of the number of points that team finishes with, compared to how many points they 'should' get. Plus or minus a point across a season? That obviously could happen - you sneak one extra draw, or rattle the bar in the 90th minute at 1-0 down just once in the season and there's your extra point either way. One lucky won or lost game is also pretty easy to envisage, or maybe even two lucky wins or losses. Three lucky wins and nine points? For me that's within the bounds of possibility, but starting to feel more unlikely.

From a statistical point of view, thirty eight games per team isn't all that many, so some level of randomness is definitely going to creep in. The challenge is to work out how much randomness and whether it matters in the grand scheme of things. Whether the league table is random enough that the best team won't always win the title, or if an unlucky mid-table standard team can get relegated.

Working out random chance across a whole season is more difficult than working it out for a single game against the hypothetical 'average opponent' that we saw in my last blog post. In response to that previous post, a few people asked how you'd set up a team's chances against a specific opponent, rather than a generalised 'average' one, which illustrates one of the key problems. If you could predict goal scoring and concession rates against a specific opponent, then you could predict the final score. Then you'd be able to beat the bookies and make a lot of money. Which is hard. In essence this is what my prediction model tries to do and is too complex to form a base for this analysis.

For this post, I'm going to assume that bookmakers' odds are a fair representation of each team's chances in a game and use that as the basis to simulate a season. You can argue with that approach, but I have a suspicion it's going to cause fewer arguments than any other results prediction method that I might use.

At least everybody can see where these numbers have come from and as an added defence for this method, if the bookies odds were consistently wrong across a lot of teams, across the whole season, they'd be losing a fortune.

"There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know."

Donald Rumsfeld

OK, that's probably not very helpful. The reason for the quote is because the bookies odds aren't perfect and it is possible to predict better than they do, which might reduce the amount of 'luck' (can we call it random variation instead?) that we're about to measure. It's possible to consistently predict better than the bookies and there are also all of the factors about a game that neither we, nor the bookies know. If a manager plays his best striker - who's nursing an injury - and loses due to a poor performance, that's not bad luck, it's a bad call. But if we didn't know about the injury then it won't have been priced into the market odds.

What really matters for this post isn't that we have brilliant odds for each individual game, but that we have a fair representation of what a season looks like as a whole, so that we can run simulations. It's not about the individual teams, it's about having a realistic spread of probabilities across a season's 380 games and for this, betting odds should do a good job.

I'll come back to the definition of luck and the implications of using betting odds at the end of this post, but let's get stuck into some numbers. Here are Arsenal's odds for each game last season, taken from Bet365 and re-based to remove the bookmaker's margin so they sum to 100%. (data from www.football-data.co.uk/englandm.php)

If you run that fixture list ten thousand times with those odds, on average Arsenal will finish with seventy points. They scored seventy three points in 2012/13, so on that (huge!) sample of one season at least, the odds-based method seems sensible.

We get a distribution of points for Arsenal using our ten thousand simulated seasons, which looks like this.

Sometimes Arsenal will score fewer than seventy points and sometimes they'll score more, purely through random variation, or 'luck'. The standard deviation of Arsenal's final total is 7.5 points, which means that although the average is seventy, in any given season Arsenal are likely to do seven points better or worse than that. In 66% of seasons, their final total would fall between sixty five and seventy nine points.

One simulation in ten thousand, last season's Arsenal squad is ridiculously unlucky and gets relegated with less than the magic forty point total. This might sound far fetched until you consider that it's a one in ten thousand chance and there have only been 126 seasons of professional football in England, in total, ever. The chances of relegation happening to last season's Arsenal squad are vanishingly small.

Running the same ten thousand season simulation exercise for each team in last year's Premier League, gives you the following points distributions.

We've got a clear top two, a fairly well ordered top seven including Everton and then feasibly any team from eighth downwards could be relegated. Ouch.

We can translate those points totals into finishing positions for each team in each of the ten thousand simulated seasons, to get a likelihood of achieving different positions. Some of the randomness starts to disappear here, because for a team like Newcastle or Fulham to be relegated doesn't just need them to be unlucky. Other teams below them need to be lucky too.

And because I know you're going to want the raw percentages for those...

Were Newcastle unlucky to finish 16th last season? I'll leave that as a rhetorical question.

Picking on Liverpool, a team with their 2012/13 chances in each game (and please note how careful I'm being with my words here; not Liverpool, but a hypothetical team whose chances were exactly represented by the odds Bet365 gave Liverpool last season) would win the league by 'luck' one year in twenty (5%).

The analysis has turned up one more result and it's a result I found quite surprising. Before running any numbers, I'd hypothesised in an email exchange with @SimonGleave that good teams and bad teams would have less randomness in their points total, with average quality teams seeing the most random variation. The reasoning for this crudely being that good teams win a lot and bad teams get beaten a lot and both of those things reduce the space for luck to play a role.

I was wrong.

Here are the standard deviations of season points totals for each team in 2012/13.

Everybody scores plus or minus seven points. Weird.

What that does mean in effect though, is that teams higher up the table have less random chance as a proportion of their total points, than teams lower down, since seven is obviously a much larger proportion of forty points, than of ninety points.

We have a standard deviation 'luck' (I still don't like that word) measure varying from 9% of Man City and Man U's most likely points totals, through to 20% of Reading's.

My hypothesis about why seven appears to be the magic number for all teams is that every team has a number of peers - teams they're similar to and will share points with - and a number of teams that they're either much better or much worse than. This gives every team a similar number of games with fairly uncertain results, where points will be shared, compared with fairly certain ones, whether that's fairly certain to win or to lose.

I've linked to this piece of work before, but it's reassuring how close this result is to the key finding of plus or minus eight points in a post by James Grayson, which kicked off this whole thought process for me. My initial reaction to that number was that it felt too big, but now I'm coming to a very similar conclusion.

I do think we should be treating these numbers as a maximum level of random variation though, because in reality teams will react to their league position and try to change their chances. Teams with more financial resources will be able to react to an unlucky first half of the season by signing players and improving their odds in the second half.

Better predictions than bookmakers manage could also reduce the amount of measured luck, because we'd be more certain about which team should win each game, reducing the level of random variation in results. As I said earlier, the bookies aren't perfect (just annoyingly good) so this should definitely play a role in reducing the true standard deviation below seven.

Finally we're also back to the core question of what is luck? I'm not at all sure when most people say 'luck' that they intuitively mean 'random variation'. It's more nuanced than that. Steve Fenn (@SoccerStatHunt) tweeted a nice definition yesterday:

The key word here being 'unearned'. This got me thinking that you're lucky if:

So what's 'consistently'? I think this goes to the heart of what we'd intuitively define as lucky. In the simulations above, Manchester United had a 36% chance of winning the title, just about equal with Manchester City. If either of those teams win, are they lucky? They've got the best chances out of any team, but 36% isn't huge - it leaves a 64% chance that some other team wins it, which is much more likely.

By those percentages alone, you'd always need luck to win the title.

If you win a single game that you had a 49% chance of winning, were you lucky? After all, there was slightly more chance that you wouldn't win it.

What we mean by luck, seems to be lurking somewhere in an area of 'beating a team that played significantly better than we did'. For me, 'lucky' has an intrinsic element of fairness in its definition. Lucky wins are unfair. Things shouldn't have happened that way. Somebody else deserved the win.

That's why luck is so hard to define - because it's subjective. Your definition and mine could well be different. As a neutral, I'd say a non-league team with a 5% chance of beating a Premier League opponent in the FA Cup third round deserves everything they manage to achieve. A fan of the losing Premier League team, who's going to take a pasting in work on Monday, would probably call them lucky bastards.

What we have shown here, is that bookmakers' odds suggest we're watching a league where in any one single season, each team's points will swing plus or minus seven from their most likely total. That could well be enough to relegate an undeserving team. Of course, that's undeserving depending on whether you support that particular team and depending on where you, personally, draw the percentage line on 'unlucky'.

Wednesday 4 September 2013

In my experience, there's no such thing as luck... Except in football.

Football's back! Hurrah!

There'll be an update on the predictive model in the near future, but suffice to say for the moment that it's not dead, which is a huge relief. The predictions will be moving off Wallpapering Fog so that I can keep accessing Opta's data, but we'll be giving the bookies another good pasting from a new home this season, starting in a month or so's time. Stay tuned.

In the meantime, I've been thinking about luck.

There's obviously a lot of luck in football. Games are low scoring and that means one extra goal is very valuable, which is why we celebrate scoring so much vs. a sport like basketball, where it almost makes more sense to celebrate thwarted opposition attacks than your own successful ones.

The margins in football are small. Overall, in the past five seasons, 9.2% of shots were goals and 1-1 is the most likely scoreline in a single game. In that context, the possibility of an additional 'lucky' shot leading to a 2-1 win, rather than a 1-1 draw, is very real.

I'm far from the first person to work this out.

James Grayson has posted an excellent analysis, which concludes that the average team in the English Premier League has a random points variation of +/- 8 points. That is, whatever your 'correct' league placing, in any given season you'll do 8 points better or worse than that, just through pure chance.

Zach Slaton follows up this work on Forbes, extending it into Arsenal's chances of a top four finish and speculating about whether they have been lucky in recent years with Champions' League qualification.

And The Numbers Game concludes that football results are 50% luck. (Say the reviews. Hands up, I haven't read it yet, so I can't treat this claim fairly. I've ordered a copy.)

I don't like luck. Or rather, I feel strongly that if a large proportion of football results are down to luck, then we need to be very sure we measure that proportion correctly. If nothing else, how do you tell a manager whose job hinges on avoiding relegation, that he might as well flip a coin, because it's out of his control? Or tell a mid table side that they'll finish anywhere between eighth and sixteenth and there's nothing they can do about it? I know that's not exactly how probability works, but it may well be how the analysis is perceived. Which is important.

I want to have a crack at this question myself using a bottom-up approach rather than looking at points variances across a league. It might come out with exactly the same answer, but I like building analyses from the ground up because you can see the assumptions more easily and it's often easier to communicate what you did, rather than asking for an audience to trust in a complex formula.

So how do we build a ground up analysis of luck?

I'm going to begin with shooting, which is an assumption running through this whole analysis. You'll see why in a second.

Across the past five seasons, the average EPL team had 14.3 shots per game and scored with 9.2% of those. We can take the 14.3 shots and do some basic (ahem, I didn't have to revise these methods at all, honest) probabilities...

In simple terms, a shot outcome is 1 or 0. Either the shot is a goal (1) or it isn't (0).

We've just seen that the chance of a single shot going in is 9.2%, so the chance of it not going in, is 90.8%.

The chance of a team having 14 shots and not scoring with any of them, is 0.908 * 0.908 * 0.908...

= 0.908^14

= 0.26

So for an average team, taking an average number of average quality shots, there's a 26% chance they don't manage to score at all in any one game.

If you want to see what the chances are that they score once, or twice, or more, then you need the binomial distribution. It gets a bit complicated and you need to use a distribution, because any combination of the 14 shots might go in, or not, and that's a lot of different combinations.

Here are the chances for a team that takes 14 shots in a game, of scoring different numbers of goals.

And here comes a big assumption. In reality, there are 'score effects' in football, which this analysis isn't going to consider. Score effects mean that certain scores are 'sticky' because they encourage teams to sit back and defend, while other scorelines let a team relax and attack. Think about what happens when an important game is at 1-0 going into the last 10 minutes, compared to if one team is already 4-0 up.

If we ignore score effects, what might the result look like, for two exactly equal and average teams playing against each other? They'll draw, right? Well no, not usually...

They'll have 14.3 shots each and score 9.2% of those.

Here are the possibilities for a draw:

So two hypothetical teams which are exactly identical in every way and take an average number of shots each, will only draw 27% of the time.

The other 73% of the time, they share the wins: 36.5% each.

This is, I think, the extreme example of luck in football. The game should be a draw, but 73% of the time it won't be. You could say that in 73% of these games, the result is being determined by chance. Overall of course, it evens out over a large number of games, but it doesn't in a single game and these teams will only play each other twice per season.

I was drawn into this topic because I didn't intuitively like the levels of luck that were being suggested and now I'm saying some results are 73% luck. Damn.

If you double the shot conversion rate, you get even fewer draws, coming out at 20% of games. Teams draw most often when either they don't shoot, or when the shot conversion rate is very low, which makes sense. At only five shots per team, the chances of a draw increase to 48%, with a lot more 0-0 and 1-1 results.

That's enough about draws. What if we look at wins?

We should pause here for a moment and define what we mean by 'luck'. I'm taking it to mean that the best team doesn't win, even if that best team is only very slightly better than their opponent. If the lower skilled team wins or draws, purely through how the dice rolls on shot conversion, then they were lucky.

This isn't totally satisfactory and goes to the heart of why I don't really like talking about luck. It's easier to call the random variation in a team's total season points 'luck', than to label a single game 'lucky' but the concept is the same - it's about winning points you don't deserve, or losing points that you do. If a non-league underdog beats a Premier League team 1-0, having taken only one shot to the superior team's twenty shots, were they lucky to win? I think most people would agree that yes, they got lucky. The newspaper sports pages wouldn't, they'd call it a giant killing, but then, if Goliath should squash David nine times out of ten, David got lucky.

Luck kills much of the narrative that we love about football. I watched Exeter City get a 0-0 draw away at Old Trafford in 2005 and it's the best game I've ever been to. Were we lucky? Of course we were. On a different day, Scholes scores and Man U win. On most days, Exeter would get battered.

But to just label that result 'lucky' and dismiss it, is to dismiss a lot of what makes football great. If a manager sets up a team to have a 1% chance of winning and wins, he's lucky. But 20% chance? 30%? I'd prefer to say you're giving yourself more chance, than that you're lucky.

However, for this analysis we need a dividing line. For the rest of this post, if you beat or draw with a superior team, purely through the way the dice rolls on shot conversion, then you're lucky.

The draw example above was a fun starting point but it's not really sensible because you'll never really have two exactly matched teams. It can give us a good idea of maximum luck though because if we assume that Team A is very, very slightly better than Team B, then we get:

Team A win: 36.5% (plus a tiny marginal amount)
Team B win: 36.5% (minus a tiny marginal amount)
Draw: 27%

Team A should win. They're the marginally better team and Team B can only beat them by being lucky. The chances Team A don't win are 63.5% (the chance that Team B wins, plus the chance of a draw).

So in a very evenly matched game between two statistically average teams, we've got 63.5% of the result being down to luck.

Let's ditch the hypothetical average team and bring in some real ones. Here are average shots per game and goal conversion rates for each EPL team, averaged across the past five years.

Properly calculating the chances that a team will win a game instead of draw it, is slightly difficult, because there are many combinations of scores that will win you a game. We work out the chance that a team will score 1 goal, with the opposition only getting 0, then of the team getting 2 goals, with the opposition only getting 1 or 0, then of getting 3 goals, with the opposition scoring fewer than that... and so on. Then we add up all those different combinations that would give you a win and you have the overall chances of winning.

To see what that looks like, here are Manchester United's goal chances - based on the table above - vs. the average team that we saw earlier.

Based on these chances, Manchester United would be expected to beat a statistically average team 55% of the time, with 22% of games drawn and 23% lost.

(I know the opposition are unlikely to get their regulation 14 shots against Man U. We'll get to that in a minute...)

If we say that Man U are the better team and that the fairest result is a Man U win, then the opposition are 'lucky' 45% of the time, when the game ends in a draw or a loss for United.

Now we've got a spread of luck, from 63.5% when two teams are almost equally matched, to 45% when Man U are playing. That feels better. Man U have historically dominated games and left less space for random chance.

We can do this exercise for the whole list of teams above. How much luck is there in the result, for each EPL team, when playing against an average opponent who gets off 14 shots, with a conversion rate of 9.2%?

Luck works both ways. Manchester United have a low luck coefficient because they're likely to win; Middlesbrough also do, but because they're likely to lose.

Following me so far? There's still far too much luck here though, because we haven't introduced defence yet.

As I mentioned earlier, Manchester United don't let the average team take 14 shots, or let them convert those shots at 9.2%. They defend much better than that.

Here's each team's attacking performance - that we saw earlier - and now also average defensive performance. The extra numbers show us how many shots the average opponent manages to hit against each team and how many of those go in. Again this is all summarising over the past five years of the English Premier League.

The final piece of the puzzle is to set up each team against their own average opponent, instead of always using the general average opponent across the whole league.

Manchester United's own and opposition scoring chances now look like this, which feels a lot more like a real picture of Sir Alex on a Saturday afternoon, against a mid-table side:

Here's what that does to the luck coefficients for each team:

We end with a spread of results that are happening by chance, from Manchester United at the bottom end, where their average opponent can achieve a result through luck only 32% of the time, to Fulham where a very high proportion of the result - 63% - is being governed by chance.

It's worth stating that this doesn't mean Fulham are particularly 'lucky', or even that they're a bad team. It just means that they're incredibly average, so when pitted against an average opponent, the result could be anything.

That's it for single game probabilities and if you've got any feedback on the methodology, or you want to tell me I've got my sums wrong, please jump on the comments below. The next step is to take these and apply them to a full season, to see if Arsenal really are fortunate to keep making that top four and also maybe try to work out whether relegation really is a lottery.

Luck in football part 2. Can you have a lucky season?

Stats from EPL Index, which unfortunately doesn't do public data any more, but is still a cracking site and deserves acknowledgement!

Pages