Tuesday, 14 February 2012

Losing touch... or why Excel and VBA won't cut it any more

Thinking through this post is making me feel old. There's going to be a lot of 'in my day' type reminiscing and I'm only 34. It's all this new fangled technology that's doing it. The world's changing fast. I hate people who say that the world's changing fast, but this time it's true.

I got my first proper job twelve years ago this month, as a junior analyst with a small econometrics consultancy and although the statistical techniques I use are roughly the same as back then, I've started to realise that our software tools are going through a revolution. Hence this post - I'd like to stop and look around for a minute to see what's happened.



Fairly quickly after starting that first job, I discovered that data processing in Excel was a hell of a lot faster and easier if you learned Visual Basic for Applications (VBA), so I did. With the help of our IT department and a lot of practice, I got pretty good and it went a long way to getting me promoted because I could make dull work happen quickly, make other peoples' lives easier and build some nice interactive spreadsheet tools for our clients.

Up until fairly recently, if an aspiring analyst asked what they should do to get ahead at work, I'd say get good in Excel. Really good. And learn VBA. The first bit's still true, but VBA? Not so much.

The trouble is, VBA's getting left behind. It's still worth knowing some, but it's nowhere near as important as it was, because creating tools in Excel is nowhere near as important as it used to be. It's also not a good gateway into other types of programming because as a language, its structure is out of date. Although some programming skills are always transferable, you need to pretty much start again when you want to learn another language after VBA.

There's also a problem for the next generation in that they need to get luckier with where they start work to get exposed to the right kit. Everybody uses Excel, so at some point, every inquisitive analyst ends up in VBA. The new generation of tools probably won't be on your PC unless you decide to put them there.

So, you're ambitious and you're six months into your first analyst's role. What do you learn now? Even if your company doesn't use these, this is where I'd start. It's the kit I'm using (and still learning) and it's free, so you can pick it up as a CV booster without buying expensive software. If you're a junior analyst reading Wallpapering Fog then I hope this list might help. You also have excellent taste in blogs, so well done on that.

Let's look at what you need to be able to achieve, as an ambitious analyst...

Collect data

This is much more important than it used to be. Ten years ago, if you didn't have the dataset and the client didn't have it, then you'd have to buy it. Either way, almost certainly it would turn up on a spreadsheet or csv file. You often needed VBA macros to clean it up and make a tidy spreadsheet.

Now, some of your data will arrive like that (so a few simple macros are still handy) but very often, you'll want to trawl the web for it. Senior staff love it when you tell them you can scrape the data that they want off the web, automatically and for free. It will make you famous.

You could learn a proper programming language, but we're statisticians not programmers, so unless you want to do that for yourself anyway, then you need a tool which is designed specifically to work with statistical data. For analysts, R is the new VBA. It's free and it's well worth the effort that it takes to learn.

Learning R gives you the same head-start that VBA gave ten years ago. You don't need to buy new software (just like VBA, which was always in your copy of Excel anyway) and it will let you do things that are otherwise the preserve of IT, which should be the ambition of any good analyst. If you need IT to sort data out for you, then you've failed.

If you get good in Excel and good in R, you'll be in a promising place from which to get your data assembled, which brings me onto...

Process data

Excel worked well when data came in thousands of rows. It still works well for lots of things and the latest versions have finally broken the 65k row limit, but there's a problem. If you throw lots of data at Excel - properly lots - you'll break it. Or wait forever for it to calculate. Excel isn't designed for processing databases and that's what we're working with now.

R can do it, but you need a good level of SQL too, even if it's just to make Access work properly. SQL turns up everywhere and it's easy to learn.

To be fair, you've needed SQL for ages but I keep coming across analysts who aren't comfortable using it. You can't get away with that any more.

Build your models

Excel for the simple ones if you like - it's still a very powerful bit of software. For more complex statistical models, you need something else. Again, R is good. Some of the older competition like SAS (which is another reason to get a good SQL grounding) is starting to look very dated. It's also hugely expensive, particularly when compared to open source.

There's no way I'd adopt SAS now and it's being kept afloat by a legacy of systems embedded in big firms. If you end up using it, fine, but don't learn it unless you have to.

I'd go with R again. And I have.

Make some output


The days of the interactive Excel workbook, emailed to a client, are over. Or rather, they're not quite but they should be and soon will be.

You need to be able to make good looking charts and output in Excel (start here) so that you can illustrate your PowerPoint decks because unfortunately, PowerPoint is still an essential tool to know.

For interactive output, you want dashboards. There's only one bit of kit to learn for the moment and that's Tableau. If you can't persuade your company to buy you a copy, then get the free version and have some fun publishing to Tableau Public. Give it a couple of years and there are going to be some exciting roles around for people who can do good things with this piece of software.


So there you go. Learn a few macros by all means and definitely get very good with the front end of Excel, but take it from someone who's invested a lot of time in VBA and never uses it any more, there's a new world of software coming and you need to learn it. What worked ten years ago, won't cut it in another five.

The scary thing is, that means old buggers like me need to learn a load of new kit, and quickly. Back to the books...

3 comments:

Anonymous said...

Interesting post (for me anyway!)

As a data analyst for the last 12 years I've found, if anything, Excel becoming more relied upon as a medium.

Granted, most clients I deal with don't want anything more complicated than an average, but on a day to day basis my experience is that Excel is the standard 'language' for businesses and their data, so those that speak it well are always popular.

I've tried presenting data in charts that move away from the standard bar, pie or line, but the response has typically been to revert to something generally more familiar so clients can quickly process the message (typically on a static PowerPoint slide).

SQL is here to stay (agreed, I'm a convert), and R seems to be another 'language' to speak (the more, the merrier), but I wouldn't say Excel's days are numbered in the next five years...

Jon

Anonymous said...

[cont.]

With Excel still going strong, VBA in my mind is a quick way to access and share data.

I use it to reduce the manual chug of data for others, but the real value is in the use of user forms. For example, we use them to create user front ends that others in the business can use to access SQL databases and match to their data, whilst staying in a familiar environment.

Neil C said...

Hi Jon,

Have you tried any of the new BI tools? Tableau in particular? The past 18 months or so have been a revelation for me as I've started to deliver reports in things other than Excel.

The big issue with Excel is that it's very bad at delivering interactive dashboards to any more than just one or two people, while keeping them updated with recent data. Excel services should be a Godsend for this, but it's bloody awful and most people don't know it exists (see http://wallpapering-fog.blogspot.com/2011/03/dashboard-visualisation-software-excel.html)

I don't think Excel's going anywhere for the moment, but I do find that that some of the bigger data processing jobs and presentation of data that I used to use it for are now happening elsewhere.