Friday, 28 August 2015

Don't lose sight of what's important

This blog entry is short, by necessity of the fact that I'm typing (very slowly) with one hand, having broken my arm, as a result of a rather silly cycle manoeuvre. However spending time at the hospital neatly provided me with an observation for today.  Healthcare is widely seen as a key area of potential enhancement through the use of data and analytics; and many technology vendors like to showcase examples from healthcare; to demonstrate how data can enhance health predictions, use IoT monitors to detect early-stage issues, use detailed analytics to enhance operational efficiency and effective use of limited resource. There are many really good use cases for healthcare data. 

So I was intrigued to spot a 'dashboard' report in my hospital's A&E department [emergency room], though the contents proved to be somewhat disappointing. The report had three core elements:
  • summary of responses to a questionnaire that asked "how likely would you be to recommend our department to friends & family"; with a bar graph to depict the monthly volume of responses, together with another bar chart and a pie chart to show the split of responses for the last month, and also a table of data. So three depictions of the same data, to highlight that 80% of people would be highly likely to recommend the department. Incidentally 'highly recommend' is the first option in the SMS questionnaire. I'm sure they test the questionnaire by reversing the sequence of responses to ensure there's no bias in how the question is asked.....
  • a list of comments received with the questionnaire responses - there were a handful and included "." and "comment" - [yes seriously]
  • a highlight that the department had received 5 letters praising the service; listing quotes from each letter
  • a highlight that 6 complaints had been received; with a single word bullet for each
I'm not going to name the hospital, as this is just illustrative of some generic weaknesses in using data, and amongst all the discussion of big data, advanced analytics, machine learning there is often a fundamental failure to focus on core objectives when summarising and communicating data: 

What's the key objective?
Why are we producing the dashboard? Hospital A&E's get a lot of attention due primarily to the cost of the service, and the fact that by nature patients need urgent, timely attention. But equally there's been concern that the service is abused, by non-urgent cases; and this impacts attention of care for serious cases. Understanding the motives for any analytics or summary is key. Without a clear objective, there will be incorrect focus. For example this unit is now called the 'Emergency department'; the 'accident' element has been dropped; from this it seems clear the hospital is pursuing an approach of ensuring the resources are used for only urgent cases. This is backed up by some other clear graphics that depict what the department should be used for.

Who's the audience?
Understanding the objective leads on to understanding the audience. This was a public dashboard - I expect they have a more detailed internal version. but the public version should focus on the public objectives. Providing the public with a summary of feedback on how recommendable the service is seems questionable in purpose and smacks of self-congratulation without any clear objective. 

What do they need to know?
I can immediately identify some key things I'm interested in as a patient: what's the typical wait time (for the kind of issue I have)? how does wait time vary by day or week or time of day? (my broken arm needs attention, but I could come earlier or later in day if it speeds my progress). And how do these wait times compare to other hospitals. it'd be useful, and insightful. Furthermore highlighting the proportion (hopefully declining) of inappropriate cases - that would better be dealt with elsewhere, with examples and care paths for the most common of these.

What will they do with the insight?
Insight is only useful with action;  and since my injury isn't critical then I could be flexible with what time of day I attend; so knowing what the peaks and troughs are might help me think about future attendance. similarly being aware of alternate options for other non-critical cases would keep me out of A&E in future. If the aim is to reduce non-urgent cases, then more help is required to flag examples, and explain alternate routes for medical treatment.

As I mention I don't want to single out this hospital, as this is just one report - it may itself be an outlier from an excellent analytics team. Instead this highlights how any piece of output needs to consider the fundamentals; and if it doesn't address these then it shouldn't be produced. Use that scarce analytical resource on something that will make more difference to strategic objectives.

In the meantime I have a few weeks to increase my one-handed-typing speeed and accuracy.

Thursday, 20 August 2015

Start exploring Open Data (it's more than just maps)

There’s been an increasing interest in Open Data in recent years; Google Trends show a steady increase searches over the last 5 years, with a heavy concentration in the UK. In part this interest mirrors the acceleration in the datasets available. The Global Open Data Index keeps track of the status of Government Open Data initiative globally – it identifies 97 places (countries) with Open Data, and monitors the scope of data available across topics such as Government spending and budgets, election results, national statistics, legislation, company registers, maps, postcodes etc. The UK is ranked with highest availability.

Trying to get a feel for how extensively such data sets are used is pretty patchy. Sources such as OpenData500 provide some summary information,  though surprisingly this doesn’t include the UK – and some good graphical representations of which industry sectors use which government departments data sets – for the US the Data/technology sector being just ahead of the Financial Services sector in usage. Both sectors use the Dept of Commerce heavily, but the FS sector’s leading usage is (not unsurprisingly) data from the Securities & Exchange commission.

This kind of high level view is broadly interesting, but doesn’t actually help organisations get started with exploring what they can (and should) be doing. Equally much of the media coverage positions the concepts of OpenData via examples that stimulate more thought rather than action. This excellent article covers ‘5 ways that Open Data is changing lives’; these programmes are fairly wide-ranging from global initiatives to local (e.g. Edinburgh’s City Scrapbook). Similarly this article in the Guardian (probably the most vocal in the UK media regarding the possibilities and opportunities for big data) provides some fascinating examples of the breadth in scope of Open Data usage – with a focus on “How Open Data can help save lives” and topics as diverse as from where to locate defibrillators to understanding cycle safety hotspots.

Some commentators claim that the provision of Open Data has become a tick-box exercise with government bodies just being happy to say they’ve ‘done it’ rather than consider how useful and accessible the data is. Other angles attempt to identify the top lists of Open Data, (or here) again interesting but less of a practical help if you have a specific problem you are trying to solve.

What’s very encouraging is the lead being taken in the UK on Open Data, across a range of dimensions: collection, released and re-used – driven by an on-going government commitment to Open Data and the initiatives that include dimensions such as increased training in data initiatives and encouragement in data consumption, assisted by the National Information Infrastructure.

If you haven’t already investigated UK Open Data, the start exploring: take a look at http://data.gov.uk/ and browse or search some of the 27k data sets available, understand more about the strategy and direction of Open Data at the Open Data Institute: http://opendatainstitute.org/  The ODI kicked-off a survey in November last year that looks at how commercial organisations are using Open Data, research findings published in June this year provides some clear focus points that could help many organisations contemplating wider analytical sources:

  • The most popular datasets for companies are geospatial/mapping data (57%), transport data (43%) and environment data (42%).
  • 39% of companies innovating with Open Data are over 10 years old, with some more than 25 years old, proving Open Data isn’t just for new digital startups
  • ‘Micro-enterprises’ (businesses with fewer than 10 employees) represented 70% of survey respondents, demonstrating a thriving Open Data start-up scene. These businesses are using it to create services, products and platforms. 8% of respondents were drawn from large companies of 251 or more employees.
  • 70% of companies surveyed use government Open Data, while almost half (49%) of the surveyed companies use Open Data from non-government sources, such as businesses, non-profits and community projects. 39% use a combination of government and non-governmental Open Data.


Thursday, 13 August 2015

Do you have bigger concerns than big data?

If you read much of the fluctuating hype around Big Data there’s a common theme in the negative camp that questions how much real, tangible value there is in Big Data analytics. Sceptics question the challenges in putting an ROI on something which by nature is exploratory. A key premise of Big Data’s distinction from traditional analytics is that whilst the latter is about providing answers to known questions, Big Data is about finding the questions you hadn’t thought of. With such ethereal qualities it’s not entirely surprising that it’s easy for the media and analysts to whip up some copy that creates fear amongst those embarking on a big data mission. And just like traditional analytical projects, tales abound of huge investments into projects which were culled after sinking significant amounts of cash, time and people with no significant return. Not surprising then that it’s easy to find nebulous articles, and research into business perception (that's been fuelled by such articles) that make vague claims about the disappointment of big data, without much tangible evidence.

Not surprising then that any board contemplating a big data project will be caught between the rock of ‘we have to do this Big Data thing, because everyone else is’ and the hard-place that demands that every investment satisfies the corporate ROI evaluation hurdles. The “Build it and they will come” philosophy never worked for Data Warehousing, and the “trust me, there’s value in the data lake” won’t for big data.

Many organisations have rushed too quickly into Big Data technology evaluation and got fingers burnt, satisfying the geeks desire for hot technologies to feature on their CV, but not providing business value.

The most viable approach is one of focus; starting with a clear use case, thinking about specific priority business challenges, that would show real-value ‘if’ data could help to resolve, or at least provide greater clarity or even marginal improvement. Then based on a short-list of these ideas develop a feasible set (small in number) that can be trialled. You may call this a pilot, a proof-of concept, a proof-of-value, but it means a fairly rapid, low cost, rapid approach to exploring the data and identifying if there is some potential value.

As the examples below highlight (from this article), there are plenty of tangible examples of where big data provides real business value, and there are few ideas in this list that don’t fulfil the above criteria:

  • a focus on a known, significant business problem, that has tangible financial implication
  • an exploration of the data (and that’s all possible data) to see which elements add value, and to discard from the solution, those that don’t

There’s a fair chance that if you don’t find any value in the potential data sets to help resolve one of your most critical business issues, then you’re not going to have a business to worry about for much longer. Your worries are bigger than Big Data.

AIG (Insurance): using a wider data set (including ‘unstructured’ like handwritten claims notes, to enhance fraud identification
AMEX (credit cards): enhanced predictive models – identifying ¼ of accounts that will close within four months
Delta (Airlines): tackling the frustration of lost baggage – and expense to the airline – by providing customers with access to baggage tracking data via a mobile app
FT.com (media): analyse content preferences to improve personalisation
Huffington Post (Media): use real-time analysis of social media trends, recommendation and moderation to enhance personalisation
Kroger (Retail): understand customer behaviour – to drive loyalty and profitability
Southwest (Airline): using speech analytics to understand (and improve) interactions between customers and personnel.Understanding online behaviours and actions, to improve offers and increase loyalty, revenue and profit.
Red Roof Inn (Hotel): using a range of data to pinpoint travel hot-spots (bad weather, cancelled flights etc) and enhance targeting of 'stranded travelers' with hotel offers
Sprint (Telco): analysing network traffic to improve quality and customer experience
Tesla (Automotive): collecting sensor data, increasingly in near-real-time, to identify performance issues, recommend maintenance schedules and enhance R&D; all to improve customer satisfaction.
UPS (logistics): using data (telematics, routes, idle time) to enhance fleet optimisation – fewer miles, less fuel, lower costs
Wallgreens (Healthcare): ensuring patients collect their prescriptions – to help them stay on their medication – and prevent future illnesses

Thursday, 6 August 2015

How safe is your data?

One of the most frequently covered media topics on analytics is data security, or rather data insecurity. It feels like every week there is a new report of data breaches in the papers. A quick review of the UK national press over the last quarter in-fact identifies just short of 120 articles that feature a 'data breach' reference.

I'm covering this as a topic because my suspicion is that the fear of breach is more widespread than actual breach. Yes, there have been many data breaches, but without the real facts the cynic in me suggests media bias is creating more attention that the nature of the problem.  The cynic of me notes with a wry smile that 60 of the 117 articles appeared in The Daily Mail. Detail from the Newsdesk service from LexisNexis.

So I dug a bit further into the facts of recent breaches, and sought out some recent studies.
First stop was the Breach Level Index, that globally tracks publicly disclosed breaches, and produces an annual summary report. [I should flag that this is the work of a "leading global provider of digital
security solutions"] The 2014 contains some useful reference points:
  • the report is based on 1,450 breaches in 2014, an increase of 46% from 2013 - (however how much of this is more breaches or a higher level of breaches reported?). This included 117 in the UK for 2014.
  • the report highlights the source of these breaches: around 60% are external, but the significant remainder are either malicious insiders or accidental losses
  • most surprising was that of all these 1,450 global breaches less than 4% involved data that was encrypted in part or full.
Next I studied the latest 'Information Security Breaches Survey' from the UK Government Department of Business, Innovation and Skills. This survey was carried out by PWC and took responses from over 1,000 individuals, with some bias to SME's.
  • 86% of 'large' organisations had a security breach last year, and 60% of 'small' businesses. Interestingly both figures lower than 2013.
  • Almost half of the breaches (47%) were caused by staff, next highest was virus impact at 27% of incidents. External attacks accounted for 16% of incident
The Information Commissioners Office also report a range of statistics of breaches that have been reported to it. For 2014/15 1,807 incidents have been reported:
  • over half were basic failures; losing paperwork (18%), data sent to the wrong person, by post or email (30%), insecure disposal of paperwork or computer records (5%)
  • but a significant 22% related to a lack of "appropriate technical and organisational measures"
Clearly these are three quite distinct snapshots, without direct comparability; but even so highlight some common themes:
  • most breaches are failures at a basic, preventable, level : organisations should do more to address the basics
  • where breaches are more complex and especially external, whether direct attacks or due to virus or malware, then improved encryption would be of significant benefit
  • centralisation of data creates a single point of security concern, but most specialists agree that this is also easier to secure; decentralsiation creates more potential failure points and greater challenges to effectively manage
Data security will continue to be an issue, breaches will continue to happen, but organsiations can always take some basic steps to reduce their risk and exposure.