Friday 14 February 2014

Premier League attack patterns visualised

Yesterday, I posted some visualisations of approach play in the Premier League. They describe how passes into a 'shooting zone' in front of the goal tend to be more successful when they come directly, rather than from wide areas.

I've started to play with these visualisations for individual teams and a few people have asked how they look, so today I'm posting attack patterns for the current Premier League top seven. We're looking at the number and success rate of passes played into a boxed-out 'shooting zone'. Data covers the first half of the current Premier League season, up to the end of January.

For the following heat maps...

Size of square = number of passes
Colour of square = pass success rate

Large and green is good; large and red is not! It's important to look for clusters of colour rather than concentrating on individual squares because when we're looking at only one team, the number of passes included is lower.









Teams are attacking the goal on the right and are listed in order of current league position. Yes, I picked top seven because everybody wants to see how the Man United one looks.


Chelsea
Mixed approach with occasional long passes from deep. Larger number of incomplete passes from wide on the right.


Arsenal
High success rates with close, central passes and very rarely played long from deep. Significant volume of passes from advanced wide positions, but with low success rates.


Manchester City
Varied approach with good success rates from almost all areas.


Liverpool
Mixed approach with low volume of passes from very wide touchline positions. Attacks from right wing weaker than left.


Tottenham Hotspur
Greater success rates through the centre than from either wing, but high volumes of unsuccessful passes played from advanced and wide.


Everton
The Leighton Baines effect. High volume of passes from wide left but with low completion rates. Passes from advanced right also with low completion. Very few attempts through the centre and occasional long balls from deep.


Manchester United
Some approaches through the centre but attacks weighted towards wings. High volume of longer diagonal balls from the right, with low success rates.


Thursday 13 February 2014

How can an attacking team get close enough to expect a goal?

There's been some great work done in football analytics recently, looking at a team's scoring chances from different positions on the pitch, which has led to the calculation of various Expected Goals (ExpG) metrics. However it's calculated, in essence ExpG gives a player's chance of scoring from a shot, given his position on the pitch. Add up the probabilities for a group of shots and you can work out how many goals a team 'should' have scored from them. Have a look at Statsbomb if you'd like to read up on what's been available up to now.

I've managed to assemble a decent sized database of pass and shot locations from across the first half of the 2013-14 Premier League season and wanted to see if I could take Expected Goals a step further. As an indicator of shot success, Expected Goals typically paints a picture of the penalty area, with the six yard box as a hotspot and becoming colder the further out you move from goal. To a certain extent, its outputs are relatively obvious; if you shoot from closer in, you have a higher chance of scoring and shots from further out are less likely to be converted.

That's not to say Expected Goals isn't a useful metric - far from it - but it doesn't do a great deal for our understanding of how to create goals. We can quantify how much better it is to shoot from closer to the goal, but how do you get closer to the goal in the first place? If your attacks break down trying to reach the shot conversion hotspot, should you even try to get there, or just take your chances from range?

A couple of days ago, I tweeted an image of pass completion data, which we'll be building on in this post.


Pass success rate by destination


The image shows the probability of completing a pass into different areas of the pitch. We're not worried about where the ball is coming from for the moment, but are looking at the chances of passes into different areas being successful.

It's clear to see how - playing from left to right - passing accuracy starts to break down in the opposition half and then drops dramatically at the boundaries of their penalty area.

Even with half a season's worth of passes and shots, we're going to struggle with the number of data points available as this analysis progresses, so let's merge the granularity of that first image into some larger pitch areas.


Pass success rate by destination



We now have a picture of how difficult it is to pass into each area of a football pitch. What about shots?

From the same dataset, here's an average player's probability of scoring with shots from different pitch locations. Penalties are excluded and I've hidden squares with fewer than twenty shots to clean the data up a little.


Shot conversion rate by shot location



As a manager, you're on the horns of a dilemma. Scoring probability climbs to over 30% in the centre of the six yard box, but your chances of passing the ball into that location are slim.

What if we combine the two visualisations?

Pass success rate multiplied by scoring probability, gives an indication of the likely success of an attacking strategy. Pass to an easier area outside the box and shoot from there? Or attempt to work the ball closer, at the risk of losing possession?


Pass success probability * shot conversion rate


It turns out to be far from a clear cut-choice. There's a relatively large area, stretching from the edge of the six yard box, to well outside the area, where penetrating that area with the ball and then scoring once you have are quite evenly balanced at 2-3%. It's not as simple as 'closer to the goal is better' and the balance in one game is almost certainly dependent on passing quality of the individual teams and how well their opponents defend.

If we box out that 2-3% conversion area, we can move the analysis on another step.


Pass success probability * shot conversion rate


How should a team attempt to move the ball into that boxed-out shooting zone? There are three broad choices: Directly from the direction of the centre circle, diagonally, or from the wings.

David Moyes has come in for a lot of criticism this week following Manchester United's draw with Fulham, where his players hit over eighty crosses in ninety minutes. We should be able to show here whether crossing, or a direct approach, is the more successful strategy.


Probability of achieving a successful pass into shooting area


Note that I've changed the colour scale on the above image to peak at 75% rather than 100%, since the average success rate of these passes is lower than when considering the whole pitch. Squares are only shown if they've been the origin of at least twenty passes.

Once you move beyond the eighteen yard line, pass success probability drops off quickly. Touchline crosses from a 'chalk on his boots' classic winger have success rates as low as 30%. Other things being equal, the best chance of passing the ball into our key zone comes from a direct, or diagonal move.

If you're thinking "but that's not fair, most of the passes included here will be targeted at locations outside the box", then you're right. Let's tighten up our key shooting zone, to a central area of the eighteen yard box surrounding the penalty spot.


Probability of achieving a successful pass into close shooting area


Still want to hit crosses all day?

The probability of a pass from the wings finding a team mate in the shooting zone is 30-40%, while moving through the central area has a success rate of 40-50%.

This isn't the end of the story, but it's where I'll stop for now. There are many more factors to be considered, including absolute volume of passes and the fact that a successful pass isn't the same as creating a shooting chance. This analysis will provide a base to work from though and one that I'd like to extend next into different types of teams.

Ultimately, I hope that this type of analysis could answer question such as...

Should teams with worse passing shoot more often from long range? And vice versa, where is the optimal shooting area for a team that passes with a very high success rate?

How do optimal strategies change, based on specific opponents?

(using significantly more data) Can we identify hotspots where passes into the shooting zone have higher success rates? Versus specific opponents? When specific defenders are on the pitch?

Eventually, I believe an approach like this might be able to identify defensive weaknesses in a specific team and optimal attack strategies for their opponents.

Friday 7 February 2014

24,000 tweets about #Sochi

Who's excited about the Winter Olympics? Happy about the games? Angry about their location?

Let's find out...

Searching Twitter for #Sochi yields 29,800 individual tweets.

Running those through TextBlob yields 24,000 tweets that can be analysed for sentiment - positive, negative, or neutral*.

And throwing the whole lot at Google Fusion Tables lets us map them.

Here they all are. Blue for neutral, green for positive and red for negative.



Or for bigger, go here.

Just the happy people?



And just the angry people.




That was fun.

Thanks to some brilliant people who make brilliant tools; Google for Fusion Tables, and the development teams behind TextBlob and Tweepy for their Python modules.



* Please note that automated sentiment analysis is far from perfect. Especially the way I've implemented it.

The three rules of business data visualisation

I love data visualisation; sometimes just for its own sake, but mostly when it makes life easier.

The Earth Wind Map is an example of the former. It's hypnotically beautiful.



This type of data visualisation isn't so good in business though, except to use as marketing material. If you want to build a stunning animation of your customers' behaviour to put on a big screen in the office, that's great, but watching it for five minutes every Monday morning is unlikely to help you identify problems with your website. If we want to gawp at something beautiful, we call up the Earth Wind Map; if we want to know whether to take an umbrella tomorrow, we go for a simpler forecast.

In business - and I count non-traditional businesses like sport within this too - data visualisation has two main purposes.
  • To help you understand the best strategy to adopt.
  • To get you to that strategy faster than you otherwise could have.

In order to achieve those ends, I work to three simple rules when visualising business data. The ideal business report, (visualisation, dashboard, call it whatever you like), should achieve these three things as quickly and as simply as possible. The higher up the management chain the report's audience, the simpler it needs to be and the more 'added extras' become a distraction rather than useful additions.

I'm not dismissing visualisations for inspiration, or for investigation, but in business the aim of communicating data is to make the right decision and to make it quickly. This is what reports are for and so I try to design reports to communicate these three things.


1. Where am I right now?

For the metrics that you know are important (you have identified those metrics, haven't you?) Where are they right now? This could mean yesterday, a total for the past seven days, a summary of the last fixture your sports team played, or any other - relatively short - time period that works for the business.

It's absolutely vital that you don't get carried away with which metrics you visualise here. I've written before:

"As analysts, we're often the ones selling dashboards, so lets be honest about what they do well. They show data. So to be useful, you have to be someone who needs to see that data - and I mean really needs to see it. Just the number. Not why the number, or where it came from, or what you might want to do about it."

Only visualise metrics where you fully understand what they mean and know at least some of the levers that you can pull to make them change. If sales drop, you know what that means. If some single number that's a complex blend of customer values, retention, acquisition, marketing ROI and God knows what else changes, what are you going to do about it? Simplicity is good. It's also much harder than complexity.

To divert into my football analytics sideline for a moment, this is why I'm not a big fan of numbers like PDO. The definition is complicated, the name is confusing and as a manager it's hard to know what to do about it, when it's not where you'd like it to be.

That's not to say there shouldn't be complicated metrics (for example to use as predictive tools) but I don't want them on my management visualisation.

Very often, the best way to communicate some simple KPI numbers is a simple table. Who says a data visualisation can't be 'just' a table? In the right place, tables are awesome.

Here's a visualisation of website metrics, that will work well provided you already know a bit about your website.




2. Is that good?

So now I know how much I sold last week and how much traffic we got to the website. But is that good? Put each number in context.

Context can mean a comparison with the past, or with a fixed target, or even vs. key competitors.

It doesn't matter how you do this - colour coding, text flags, Harvey Balls - as long as it communicates quickly and clearly. Personally, I'm quite partial to an old school traffic light, if only because even in the marketing industry, it's hard to find somebody who can get 'green is good' wrong.

Our weekly table of web traffic stats gains week-on-week or year-on-year comparisons and a set of traffic lights. Now you can instantly see if any of these numbers need attention.



3. Is it changing?

The last piece of the puzzle is to know if your metrics (otherwise known as KPIs - this isn't revolutionary stuff!) are changing.

Part one told us where we are.

Part two told us if where we are is good.

Part three tells us if the position is becoming better, or worse.

This section is where things can get overcomplicated if you're not careful. If you've got eight KPIs and you want to show a twelve week trend, then you've now got ninety six numbers to communicate. Tables just became a really bad idea.

Sparklines however, are fabulous.

Sparklines are mini charts designed only to communicate spikes and trends in data. All this section of the report is designed to do, is to give a manager a quick visual representation of 'going up', or 'going down' and how fast.

Our website report gains a set of twelve week sparklines and we're done. In one small report, we can see at a glance where we are, whether that's good and whether it's getting better, or worse.




I love data visualisation, but in business, we need to drop the pretty pictures and understand why we're visualising in the first place. Infographics are awful for communication. They're actually worse than writing down your report as long-hand text.

If a business visualisation doesn't help you to understand the best strategy to adopt and do it faster than a table of numbers would, then it's not worth having. Build infographics (if you must), learn D3 and build beautiful animations, but recognise that they're marketing collateral, not serious business tools.

In business, we need to know where we are, if that's good and if it's getting better or worse. The longer it takes to communicate that, the further behind your competitors you'll be.