Thursday, 21 January 2016

If you're sitting comfortably, it's time for storytelling with data.

Like many contemporary concepts the world of analytics doesn’t stand still. Organisations are on a constant quest to do more with their data, to get more value, greater insight and understanding and do all this at lowest cost, but incorporating innovative technology.

One concept that gained much traction during 2015, and looks set to really peak in 2016, is in storytelling with data. So, if you’re sitting comfortably, let me explain.

One of the most common challenges for organisations is the difficulty in converting the insight derived from data analytics into something actionable. This encompasses the clear identification and explanation of the ‘so-what?’ element of a piece of data analysis. However brilliant and clever the analytical techniques may be, it is essential to clearly communicate the outcomes to business leaders, so they understand why the findings are of importance, to allow validation of the recommended action, and to ensure the analysis leads to a definitive business decision that impacts the business: typically via decisions that touch individual customers, suppliers, employees etc.

Data storytelling is a technique that is most beneficial when applied to convey what are often complex findings, derived from a multi-step piece of analytics. With a multi-step approach we can take business people on a journey, simplifying complexity, in a way that aligns with their emotional and intellectual awareness and that explains, educates and convinces.

Data storytelling is somewhat different to visualisation and in particular Infographics. Though the two themselves are quite distinct, as this article highlights.

Infographics have become a hugely common and popular approach to summarising statistics and can be found in all kinds of avenues, not just in businesses but also in news and media. There are some good explanations around of why infographics are so popular, and so useful at conveying information. This article does the job particularly well.

A key element of data storytelling is often visual, but it’s more about providing a guided path through findings to show how an analyst has taken some logical steps to arrive at a final result or set of options or outcomes.

It’s not surprising that many software vendors are seizing the momentum around data storytelling. Tableau have added a feature called ‘storypoints’ and Qlik allow a guided story via ‘pathways’.
There is also plenty of quality educational material to encourage good, if not best practice. Tom Davenport’s article in the Harvard Business Review, for example, is an excellent summary of the 10 kinds of stories to tell with data. And a good article in Computer World that identifies the trends in storytelling for 2016.

What I haven’t done here is to delve into detailed illustrations of storytelling in practice; again there’s lots of examples out there. Here are four that highlight a range of approaches:
The FT.com: What’s at stake at the Paris Climate Change Conference.
How far can you travel when your petrol / gas warning light comes on.
Gun Deaths in America: making sense of the numbers.
How sunspots impact global weather.

These examples provide an interesting range of examples and approaches that should provide a clearer guide to the art of data storytelling, but if you want to know more, there is a compilation of the best resources, including links to some excellent blogs.

And if you’re not convinced by the power of storytelling; do you need reminding what time Cinderella had to leave the ball? Or what Jack swapped for the magic beans on his way to the market? Or what animal made Dick Whittington's fortune? Just make sure your data storytelling enlightens and enchants and doesn't make your audience fall asleep!

Friday, 23 October 2015

Simplify so my grandmother can understand

It’s always interesting to watch as technologies and business strategies gain momentum and then ebb and flow in popularity. Big Data has been a prime example, with journals, blogs and social media displaying an ongoing variation between hype and stagnation. The result, especially in the intensive, opinion-rich on-line world, is a mix of views from “10 reasons why big data is failing” (e.g. Forbes} to examples portraying “the ten most valuable big data stories” (e.g. Information Age). Extremes always grab attention.

When digging a bit deeper, these conflicting views actually highlight some interesting similarities. Firstly, many of the reasons for failure are not big data or analytics-specific issues, but a reflection of poorer corporate or enterprise strategy and focus, that could relate to any project. I’d include in this factors such as a lack of clear business objectives, not considering an enterprise strategy but instead working in silos, or a lack of communication and misalignment of business and IT objectives. None of this is a big data problem.

An area that gets much attention is the skills needed for big data, with a specific focus on Data Science. This is much more of an analytical issue, but again digging into some depth reveals that the underlying issues are often communication and terminology. With a clear definition of what an organisation is trying to achieve, it becomes easier to understand what skills are needed. From this, like any project, it’s easy to develop a skills matrix and determine what skills are light in the organisation (or missing), and what can be addressed by training, and what by hiring. Organisations that assume the second step in defining a big data project is to hire a data scientist, will be on a futile unicorn quest.

A common theme in the successful criteria is for focus and clarity. A focus on objectives and a clarity of purpose, but without pre-supposing outcomes. This is a tricky balance and requires an open-minded approach – this is where the real skill necessity emerges: 
  • How do you keep an open mind and not set-out with a preconception of what a piece of analytics will discover?
  • How to let the data guide you, to read, analyse and interpret with a flexibility to move in new directions as the story unfolds?  
  • Trying out new ideas, new techniques and incorporating new data : taking unexpected detours on the journey.
  • To read data, but also avoid red-herrings, making sensible, reasoned observations and avoiding traps (correlation vs causation being a prime example).

The most telling way that this approach is successfully achieved is in how the final conclusions are stated. Ideally it will simplify and summarise, explain not just what, but why (and also why-not; to show what was tried and found not to be useful). “Simplify, so my grandmother can understand” is how one CEO put it to me.


As a direct consequence the emerging focus is on clearer communication of findings, and on topics like visualisation and storytelling. Many organisations are achieving much more with analytics and big data; their quest now is to expand the horizon through better communication, to ensure that projects become enterprise initiatives.

Monday, 28 September 2015

Summary of Hadoop Use Cases

A summary paper is now available of the recent research project into published use cases of Hadoop adoption. The seven page paper summarises the use cases by identifying the key uses to which Hadoop is being put, the major industries that are being used as case studies by the Hadoop distributors and the key benefits identified by organisations that have implemented Hadoop. The paper is available here and requires no registration.

Friday, 18 September 2015

Hadoop use cases: research underway to identify common themes

Within the next week or so I should conclude a piece of research that summarises the published customer case studies for Hadoop adoption. It's been a fascinating project to work on. With all the hype around Big Data, and technologies like Hadoop it's often difficult to get a clear objective view of real usage. The research has examined close to 200 customer case studies to identify common themes in adoption, usage and benefits, the aim being to provide a reference for those looking to adopt Hadoop. The initial findings have shown some interesting insight:

  • whilst much of the focus of Hadoop has been around the perception of it providing a low cost analytical platform, driven by a combination of its open source foundation and use of commodity hardware, this is not the most referenced benefit
  • instead it is scalability that is most commonly quoted, with almost 2/3 [65%] of documented customer stories highlighting this as a key factor in adopting Hadoop. A common driving factor here is that organisations identify a need to retain more history than they could previously handle, or to explore new data sources that had inherently high volumes of data.
  • the next most reported benefit, and directly related to scalability, was speed of analytics (the time taken to run queries), with 57% of descriptive case studies highlighting this advantage
  • the 'cost driver', comes in third place with 39% of customers specifically highlighting the savings in adopting Hadoop  
What's always interesting when looking at factors like scalability of analytics, and speed of queries, is to understand 'in comparison to what'. Many of the use cases undertook comparative benchmarks against other technologies, but many have migrated up from other platforms, most commonly MySQL or SQL server (by number of customers). In these latter cases, the 'faster' argument is always a bit thin; new commodity hardware is always going to be more performant that older solutions. Another reason why it's so difficult to get objective information of user adoption of technologies like Hadoop.

I'll add another blog post here once the research summary is available, with a link for download. Alternatively drop me an email kevin [at] datafit.co.uk and I'll email it when ready (your email address won't get added to a mailing list or distributed further).

Friday, 11 September 2015

The return of Arthur Andersen. What’s that about?

An interesting mix of news this week, with an announcement that Arthur Anderson will, according to The Times, ‘rise from the ashes’, in the form of a new French entity; providing a "unique international business services network integrating an authentic inter-professional dimension". Wow! After the dramatic demise of Arthur Anderson through the Enron scandal that's a pretty bold move. Whats even more interesting is that there is now a dispute between this entity, and another in the US, over the use of the Anderson brand. 

Getting attention is an increasing challenge in a busy, complex, dynamic world, doing something counter intuitive is one way. Given it's high profile Apple takes a fairly mainstream approach to get attention. This week, with typical drama, they put on a slick presentation announcing a range of product updates. These included the launch of the iPad Pro, offering 'desktop PC performance' into a bigger format iPad (with a 12" screen) that is 22x faster than the original iPad, and weighs pretty much the same. Further compression of compute power. This has the potential for another step-change in the way people work, creating further enhancements to personal productivity; it was interesting to see Microsoft present on stage at the iPad Pro launch.

Adding more and more power, capability and portability to tools like the iPad further enables consumers and corporations to get wider access to information. The simplicity of tools like the iPad further enhance the spread of information access. Most people have looked in awe as they have seen how easily a toddler picks up the iPad interface, and the puzzlement when 'swiping' a flat-screen TV doesn't have the same effect.

However adding performance and capability and the ability to get easier access to more information can also create complexity, and like many information or analytics problems, trying to get to a detailed, helpful piece of information is an increasing challenge. This is probably why Gartner position 'self-service delivery of analytics' at the top of it's hype cycle for Emerging Technologies. [what Gartner calls the 'peak of inflated expectations']. There are an increasing number of vendors entering the crowded analytics market with claims to offer the new sliced-bread in allowing users to intuitively gain simplified access to information that allows them to make real decisions. There are a few good technologies out there, but there is also a lot of hype and buyers should beware of claims of magic and avoid the snake oil. There is still a massive challenge in tacking large, complex data sets and transforming them into insightful, actionable information. It's very difficult to self-serve or automate things that are so complex. It'll be interesting to see how this domain unfolds over the next few years; there's no doubt that software capability will be dramatically enhanced to make the most of the data.

And whilst you're waiting for self-service analytics on your shiny new iPad Pro, there's a raft of other things you can use it for, just don't impeded clarity of vision for others, by taking selfies next time you're 'experiencing' a music festival or gig.

Sunday, 6 September 2015

The impact of TV watching on GSCE grades: causation or correlation?

This week a 'new' research study, coinciding with the end of the UK school summer holiday, highlighted parents with the alarming perils of TV on their children's exam potential. The study received widespread coverage across national and local press, with some attention-grabbing headlines, including:
  • Watching TV seriously harms GCSE results, says Cambridge University - The Telegraph
  • Each hour schoolchildren spend watching television sees GCSE results fall by equivalent of two grades, says new research - The Independent
  • An extra hour of TV a day costs two grades at GCSE - The Times
  • Teenagers who watch screens in free time 'do worse in GCSEs' - The Guardian
  • Extra screen time 'hits GCSE grades' - BBC
The majority of articles took the research findings at face value - watching TV results in lower exam grades, and assumed a cause and effect; the common mistake of assuming that a correlation is a causation. Just because there is a relationship between two things (correlation) does not mean that the two are related (causation) - one may not be the specific cause of the other. 

A good explanation of the dangers of mistaking correlation for causation, and a related example can be found in the excellent Freakonomics book, by Stephen J. Dubner and Steven D. Levitt. This identified a study that highlighted that children got better exam results if their homes had more books. Whilst there is a connection, its not likely to be causal - the existence of many books is likely to be an indicator of the interests of the parents, and therefore highlight the parenting style and approach. Upbringing and parenting is much more likely to have a causal effect. One district took the flawed approach of responding to the original study by sending two books to all homes with children; assuming this would 'fix' the problem.

The same problem applies to this current news coverage (though the research itself was carried out in 2005-2007, so it's age may reduce its relevancy - much has changed in the last decade); as the headlines assume that the amount of TV watched is the cause of exam pass-rate variation. There is very limited coverage as to the extent to which the two are connected, even though the researchers had some awareness of the risk:

The BBC quoted lead author Dr Kirsten Corder: "We followed these students over time so we can be relatively confident of our results and we can cautiously infer that TV viewing may lead to lower GCSE results but we certainly can't be certain. Further research is needed to confirm this effect conclusively, but parents who are concerned about their child's GCSE grade might consider limiting his or her screen time." Dr Corder suggested there could be various reasons for the link, including "substitution of television for other healthier behaviours or behaviours better for academic performance, or perhaps some cognitive mechanisms in the brain".

This is further backed up by detail in the research paper:

Our analyses are prospective therefore allowing cautious inferences about direction of association; however it would be impossible to tell whether reductions in screen time caused an increase in academic performance without a randomised controlled trial.

There are clearly a wide range in potential factors in upbringing that could have influenced the results. For example the research adjusted for 'deprivation' by using a post-code based scoring indicator. But as the authors indicate more work needs to be done, to provide a greater depth of analysis.

'Screen-time', whether TC, Internet or games is clearly a factor to be balanced in children's free time, few would argue that it should argue that it should have limits. Whilst the news headlines seem to over emphasise the relationship, it would be interesting how many readers have some shift in behaviour in the current weeks as their children head back to school, or if their underlying parenting style, modified from their own upbringing has most ongoing impact.

And if you're still not convinced by the risk of mistaking correlation vs causation, then take a look at this site of 'Spurious Correlations' my favorite being: the year-on-year correlation between the 'Number of Japanese Cars sold in the US' and the level of 'Suicides by crashing of motor vehicle'.

Friday, 28 August 2015

Don't lose sight of what's important

This blog entry is short, by necessity of the fact that I'm typing (very slowly) with one hand, having broken my arm, as a result of a rather silly cycle manoeuvre. However spending time at the hospital neatly provided me with an observation for today.  Healthcare is widely seen as a key area of potential enhancement through the use of data and analytics; and many technology vendors like to showcase examples from healthcare; to demonstrate how data can enhance health predictions, use IoT monitors to detect early-stage issues, use detailed analytics to enhance operational efficiency and effective use of limited resource. There are many really good use cases for healthcare data. 

So I was intrigued to spot a 'dashboard' report in my hospital's A&E department [emergency room], though the contents proved to be somewhat disappointing. The report had three core elements:
  • summary of responses to a questionnaire that asked "how likely would you be to recommend our department to friends & family"; with a bar graph to depict the monthly volume of responses, together with another bar chart and a pie chart to show the split of responses for the last month, and also a table of data. So three depictions of the same data, to highlight that 80% of people would be highly likely to recommend the department. Incidentally 'highly recommend' is the first option in the SMS questionnaire. I'm sure they test the questionnaire by reversing the sequence of responses to ensure there's no bias in how the question is asked.....
  • a list of comments received with the questionnaire responses - there were a handful and included "." and "comment" - [yes seriously]
  • a highlight that the department had received 5 letters praising the service; listing quotes from each letter
  • a highlight that 6 complaints had been received; with a single word bullet for each
I'm not going to name the hospital, as this is just illustrative of some generic weaknesses in using data, and amongst all the discussion of big data, advanced analytics, machine learning there is often a fundamental failure to focus on core objectives when summarising and communicating data: 

What's the key objective?
Why are we producing the dashboard? Hospital A&E's get a lot of attention due primarily to the cost of the service, and the fact that by nature patients need urgent, timely attention. But equally there's been concern that the service is abused, by non-urgent cases; and this impacts attention of care for serious cases. Understanding the motives for any analytics or summary is key. Without a clear objective, there will be incorrect focus. For example this unit is now called the 'Emergency department'; the 'accident' element has been dropped; from this it seems clear the hospital is pursuing an approach of ensuring the resources are used for only urgent cases. This is backed up by some other clear graphics that depict what the department should be used for.

Who's the audience?
Understanding the objective leads on to understanding the audience. This was a public dashboard - I expect they have a more detailed internal version. but the public version should focus on the public objectives. Providing the public with a summary of feedback on how recommendable the service is seems questionable in purpose and smacks of self-congratulation without any clear objective. 

What do they need to know?
I can immediately identify some key things I'm interested in as a patient: what's the typical wait time (for the kind of issue I have)? how does wait time vary by day or week or time of day? (my broken arm needs attention, but I could come earlier or later in day if it speeds my progress). And how do these wait times compare to other hospitals. it'd be useful, and insightful. Furthermore highlighting the proportion (hopefully declining) of inappropriate cases - that would better be dealt with elsewhere, with examples and care paths for the most common of these.

What will they do with the insight?
Insight is only useful with action;  and since my injury isn't critical then I could be flexible with what time of day I attend; so knowing what the peaks and troughs are might help me think about future attendance. similarly being aware of alternate options for other non-critical cases would keep me out of A&E in future. If the aim is to reduce non-urgent cases, then more help is required to flag examples, and explain alternate routes for medical treatment.

As I mention I don't want to single out this hospital, as this is just one report - it may itself be an outlier from an excellent analytics team. Instead this highlights how any piece of output needs to consider the fundamentals; and if it doesn't address these then it shouldn't be produced. Use that scarce analytical resource on something that will make more difference to strategic objectives.

In the meantime I have a few weeks to increase my one-handed-typing speeed and accuracy.