Friday, 14 September 2012

Planning a Big Data holiday

Friday analogy time. Bear with me...

Imagine you're planning a holiday. Or rather, you're deliberately not planning a holiday. You know you want a break from work but you're not sure where you'd like to go, or what you'd like to do.


Imagine also, that you're a marketer who's got very over-excited about the concept of Big Data.

So here's what you do.

You buy some really big suitcases and pack all the clothes you own into them. After all you don't know if you're going to the beach or the Arctic yet, so you'd better pack everything from Speedos to ski gear.

Will there be accommodation when you get there? We don't know yet. Best put in a tent. Or two, one for summer and one for winter.

And off to the airport, to investigate flights!

In the airport, you realise you also need lots of toiletries and medical supplies for your trip and another suitcase to store them in. You buy most of the stock in Boots and store it in your new suitcase because you still don't know where you're going or what you might need. The mosquito repellent bottle leaks everywhere and Deet is nasty, smelly stuff so you have to buy lots of things twice.

You pick a flight and head off on holiday. The flight was expensive, because you bought the ticket at the airport, rather than in advance. You'll be paying off your excess baggage fees for the next ten years.

Your hotel at the other end is expensive too because you didn't arrange a cheap deal before you left.

Finally, despite your hotel room being stuffed full of suitcases that you didn't need to bring and your bank balance having taken a hammering, you have a really fantastic holiday. A job well done.

You've probably guessed where I'm going with this one, but (not) planning a holiday like that, is pretty much the approach we're taking when we say "let's assemble loads of data and wonderful things will happen."

They might. If you don't run out of money along the way.

But if you decide where you want to go first and then build what you need to get there, you'll build something faster, better, more useful and for a hell of a lot less money.

Big Data is not an end in itself, it's a means to an end. If you don't know where you're going yet, then stop, work that out, and then go looking for what you'll need to get there.

Monday, 30 July 2012

Does the marketing industry bury bad news?

This article turned up on Adage last week. It's a proper, well thought out, scientific piece of marketing research, with an extremely important conclusion.

So why haven't you seen it anywhere?

Well, unfortunately, it strongly suggests that most of the clicks we see on display advertising may be just noise. Accidents. Slips of the mouse. There aren't that many clicks you see - click rates on display ads are around 0.02% to 0.04% - and with a click rate that low, a lot of them could easily be flukes.

Read the Adage article. It's important.


You didn't read it, did you?

OK, quick summary. The authors ran blank display ads and got click through rates on them that were significantly better than industry benchmarks for branded display ads that don't carry a call to action.

Stop trying to think up reasons why that might be able to happen, which wouldn't cast any doubt on the effectiveness of many display campaigns (which is exactly what most commenters on the Adage article immediately did). The authors even covered the possibility that people might click a blank space out of curiosity, by asking those who clicked, why they clicked. Like I said, it's a proper, well thought out, scientific study.

Adage ran the story, which is great. Adage do like a bit of controversy. As far as I can tell, the industry has buried it. If it had definitively proved that display ads made significant numbers of people go and search for products on Google, you can bet your life you'd have heard about it - it would be all over Twitter. Or if it had proved that Facebook advertising had a massive ROI? We've all seen those studies (and they're not very scientific...)

If we're going to take marketing measurement seriously, we need to accept that sometimes ads will be shown not to work as hard as we hoped and the studies that return those results shouldn't be buried without a trace. The authors here are also very careful not to be entirely negative. They're most interested in the fact that we use clicks to tune display placements, but those clicks look largely random. They don't try to make the step to any ROI implications.

Going back to item #3 on last week's top ten, we're going to see this result again. Third time around, maybe the industry might decide it can't just be ignored.

(Adage article originally found via @AdContrarian)

Tuesday, 17 July 2012

Ten rules of marketing analysis

It's been a while since we had a top ten on Wallpapering Fog. Number one on this list came up (again) today, so let's have Wallpapering Fog's top ten rules of marketing analysis.
  1. If you think you've discovered a radical, unexpected, new result that nobody's ever noticed before, your data is wrong.

  2. More complicated analysis can help you measure your marketing much more accurately. But if simple analysis can't find any impact at all from a marketing campaign, then there probably wasn't one.

  3. Nobody ever abandons a campaign that doesn't work, the first time that you prove it doesn't work. Three is the magic number.

  4. ROI means return on investment and it's measured in money. Not clicks, likes, web traffic or re-tweets.

  5. If you're not selling ice-cream, then the weather isn't responsible for your 50% year on year sales decline. Even Noah needed food and clothes.

  6. Never trust a piece of research that was funded by a media owner.

  7. Ten thousand respondents is plenty. A million is very rarely necessary - it just takes much longer to open the spreadsheet. You only need a spoonful of soup to know what the whole bowl tastes like.

  8. That means the BARB TV ratings panel is fine. Leave it alone, online people.

  9. When forecasting next year's sales, assume that your new adverts aren't any better than your old adverts. I'm sorry if that's depressing, but it's almost always true.

  10. The world is never changing so fast that you can't learn something from the past couple of years. People's basic motivations haven't changed since the dark ages.

Monday, 25 June 2012

Joe Hart officially named Twitter's man of the match.

England vs. Italy, 24th June 2012...

88,142 tweets mentioning "England"...

Analysed for positive or negative sentiment and then used to rate each player's performance.

The result? Joe Hart was England's man of the match based on tweets that mentioned player names. Ashley Young was, erm, less good.

Instead of the usual static infographic, here's a Tableau dashboard! Don't forget to click on the different pages across the top. Go here for overall England ratings, player scores and interactive player performance over time.



A few interesting bits that popped out for me...
  • Rooney's performance was nowhere near his pre-match expectation (check his time-line)

  • We all got progressively more depressed about England as the game went on. Have a look at sentiment over time and compare the pre-game level with the decline over the next two hours.

  • We were happy to make half time and greeted the second half with a big COME ON ENGLAND! Then went back to getting steadily more depressed again.

  • Cole's been harshly treated for that penalty miss. He scores a low rating due to the large volume of negatives as England exit on penalties

  • Nobody tweets about poor old Lescott! That probably means as a centre back that you're getting the job done. I thought he had a good game.

If you want to see some methodology, it's the same as I did for England vs. Sweden.

Monday, 18 June 2012

Rating England vs. Sweden using Twitter

If you follow me on Twitter (why would you not? Don't answer that) you'll know I've been playing with R a lot recently. First attempts at pulling data from Twitter resulted in a word cloud I quite liked, but which an ex-colleague dubbed the "mullet of the internet". Thanks Mark.

This time, I've pointed R at Euro 2012. Specifically, I set R running from half an hour before kick off in the Group D England vs. Sweden game - 19.30 last Friday - with instructions to pull every tweet it could that contained the word "England".

The results? 78,045 England related tweets (excluding re-tweets), running from 19.30 to 21.15.

Let's see what we got. Grouping up the tweets into 5 minute intervals, here's overall volume.


We're averaging just under 2,300 tweets every 5 minutes. That's got to be enough to do something interesting with!

It's a bit easier to read if you colour the first and second half in red, with pre and post game and half time in grey.



OK, so lots of Tweets then. One of the cool things we can do with them is to split the tweets by sentiment; positive, negative or neutral. An example of a strong positive from the database would be:

"Well done and very proud of you. England may not have the most talented players but they played with guts, passion and heart #England" @ozzy_kopite

And negative (no points for grammar here either):

"Now lets watch england lose bcoz they use caroll!!! N the game will b bored!!! #damn" @Anomoshie

The sentiment algorithm isn't perfect so we're not going to push it too hard. I'm dumping any data about the strength of sentiment, tweets are either positive, negative or neutral and that's it.

If you'd like to know what kit I used to do all of this, please see the bottom of the post. I'm assuming most readers just want to jump to results, so here we go.

Keep the five minute time-slots and divide the number of positive tweets by the number of negative, to get a view on how cheerful Twitter was feeling about England during the game.


On average, there are 2.8 times as many positive tweets as negative. That will partly be down to the settings on the sentiment algorithm though and it's the movements we're really interested in.

Twitter was very positive in the lead up to kick off, but that didn't last long. Twenty minutes in, the balance of positive over negative had dropped from 4.1 to 2.2 as Sweden failed to roll over and let England hammer them. Then Carroll scored the opener...

In the second half, we can see a trough all the way down to 2.0 as Sweden take the lead and then a positive swing via England goals from Walcott and Welbeck. The game ends on a positive / negative sentiment value of 2.9. Well played lads.

Come to think of it, well played which lads? We've got loads of mentions of the players in this database too, so let's see who Twitter thinks had a good game.

Height of the bars is positive / negative sentiment and width is volume of tweets (some players like Lescott generate really low volumes so don't take their rating too seriously.) I've restricted the database just to tweets that took place  during the first or second half. If you were slating Carroll before the game, we're not interested in your opinion here!


Carroll comes out man of the match, both in terms of sentiment and volume of tweets. There's a definite break between the players who did best - Carroll, Welbeck, Gerrard, Hart and Walcott - and everyone else. The overall England rating never goes negative (below 1,) and none of the players' ratings do either, although Johnson tries hardest, which may be a reflection of his own-goal.

Finally, let's see how the player ratings fluctuated during the game. Sentiment on top. Volume of tweets below. This doesn't work so well for players with low numbers of mentions in tweets but you can see it works for Andy Carroll. That huge volume spike is his goal.


One more; here's Gerrard. Game of two halves for the Liverpool midfielder and his rating dropped significantly after half time.



Want to see another player? Here they are - knock yourself out. If you select "False" it will show totals for tweets that either don't mention a player, or mention more than one. The chart is a bit squashed below to fit in with the Wallpapering Fog template. For bigger, go here.



Tools:

Tweet database pulled using R, R Studio and TwitteR. Sentiment analysis using the R 'Sentiment' plugin. Cleaned up a little in Excel and then all the charts are Tableau.

Tuesday, 29 May 2012

The spirit of the EU cookie law

There have been some pretty hysterical over-reactions to the (not so) new EU cookie law, some of them more factually correct than others. There have also been a lot of accusations that this law is fairly vague (which it is), that the guidance has changed (which it sort of has) and that it is unnecessary (which it isn't).

Some of the apparent difficulties with implementing the law stem from the fact that companies which are in a position to make compliance with the law easy - companies like Google and the big advertising agencies - have absolutely no incentive to make compliance easy. I'm not accusing anybody of being deliberately obstructive, but big media hasn't sat around a table and pro-actively tried to sort out a way of implementing the ICO guidance (PDF warning). They haven't done that because it's in big media's interests to build up cookies and privacy into a huge, insurmountable problem, kick potential solutions into the long grass and continue to track individuals on the web.

I've been thinking about the spirit of the law rather than the letter and in spirit, I think it's very simple. Just pretend your website is an actual, physical store and ask yourself whether what you're doing would be acceptable if it was.

So a customer visits your shop on the high street and starts to browse.

They can put things in their shopping basket, obviously. No question. Then they can take that basket to the till and pay.


Once at the till, you can ask if they have a loyalty card. If they do and hand it over, you can record what they bought and use that information to target advertising and offers, since that's part of the loyalty card deal you make with your customers. All fine so far.

Maybe your customer has brought a voucher with them. They fill in their details on the back in exchange for a discount and that's fine too.

That's a couple of easy ways to collect information about some of your customers - usually in exchange for a discount - and your customer is well aware that they're trading this information with you.

On the internet, cookies are essential to the virtual versions of those physical transactions. You put goods in your basket and the cookie remembers who you are, just so that the basket works. And only so that the basket works. That last bit is important.

All of those cookies are fine. The ICO says so and always has.

Now we come to a few tricks that are easy with cookies, but you might not like to try them in an actual store if you want to keep your customers.

You do a deal with the high street car park to put up posters advertising your store. Instead of just paying a fixed fee for the posters, you make a deal with the car park owner that you'll pay £1 for every person who visits your store straight from the car park and buys something, before they go anywhere else.

You don't tell your customers what you're doing, but you get some students on minimum wage to follow people out of the car park and see where they go. Obviously you need evidence that they're being counted properly, so you snap a quick photo of them on the way out of the car park, another on the way into the store and one at the checkout, time-stamped to prove it's the same person and that they went straight to your store.

That's probably not on, right? Which is why affiliate cookies could well have a problem.

And now inside the store. We already said loyalty cards are fine, but by using cookies you can track the in-store behaviour of something like 95%+ of your customers, without asking permission. That sounds useful. Let's do that.

We'll need a way to identify shoppers when they come back to our physical store though, without them volunteering the information via a loyalty card. Sounds like a job for more students on minimum wage! You pay a few people to walk around the store, surreptitiously dropping RFID chips into any open handbags so that when that customer comes back, you'll be able to invisibly read their ID at the checkout.

A few people notice and complain. You tell them they should keep their bag closed if they don't want ID chips dropped in it. Which is basically the argument that's being deployed when the industry tells users to turn off cookies in their browser if they don't want to be tracked.

You can stretch the analogy further if you want. Physical stores have always known how much of each product they sell. That data is like page hit counters and it doesn't need cookies. Without a loyalty card, physical stores don't know if you, personally, come in three times a week; once for a big shop, plus two short visits for milk and bread. They get along just fine without that information and always have.

Part of the reason they get along fine is my favourite quote about sampling,

"You only need to try a spoonful of soup, to know what the whole bowl tastes like"

Data on a sample of (well informed) customers who have traded that data with you is fine. You don't need to track 100% of visitors to understand your customers and tracking every single click on the web is an unhealthy and expensive obsession.

The EU lawmakers and the ICO evidently understand that most of the complaints coming from the marketing industry are bluster. We should also understand that in the end, treating customers with respect is the way to retain them. If you're confused by cookie laws then ask yourself if you'd do the same thing in a high street store. If you wouldn't, then don't do it on the web.

Friday, 4 May 2012

A happy data visualisation

As a side project this week, I've been learning how to get data out of Twitter using R. How it's done might be a post for another day (it's not really difficult, except for R's usual quirks...)

For now, here's a Wordle picture of Twitter in a happy mood, searched for "Bank Holiday" at 5pm on the Friday before a long weekend. Have a great break everyone.