Sunday 22 October 2017

WOULD SOMEONE STEAL YOUR SOFA?

FINE GRAINED DATA PROTECTION FOR YOUR MOST VALUABLE INFORMATION

“Yeah, our data’s secure, we’ve got in encrypted.”
Great, no worries about your data security then?
Well actually no. Encryption is generally applied at a broad level, an entire system, database, or physical drive is encrypted. This is not unusual, and is not bad practice, but it’s like storing everything valuable in one safe or vault and relying on a single secure lock. It’s only as secure as that one lock, so if the key gets lost or stolen, then suddenly all your cash and valuables are gone.
Encryption keys are very strong, but their weakness is often human – breaches all too often involve insiders, or bad guys on the outside getting hold of the IDs and passwords of privileged users and key holders. These are obtained via trickery, manipulation, or exploiting carelessness; the causes are many and varied. The reality is that bad guys will keep attacking, keep trying, and searching out these weaknesses and vulnerabilities.
Most organisations know they are under constant cyberattack, so they’ve been improving locks and adding security layers like cameras and stronger outer doors that need ID cards and fingerprint recognition – and the digital world has done the same, with extra security like two-factor authentication.
All these layers are good until someone gets through; they’ll still get the money, or in business they’ll get your data. And if you lose your data, it can have a financial and reputational impact that makes consumers wary of doing business with your brand.
Organisations have lots of data. Big data and analytics programmes mean that organisations increasingly value, collect, store and process growing volumes of data. And the bad guys want data of value too; in some cases this is commercial IP, but in most cases it’s private data about individuals, whether customers, employees or contacts at suppliers or partners.
Bad guys want stuff that can easily be sold for cash. If you’ve ever been unfortunate enough to experience a burglary it’s the high value, easily saleable stuff they want – cash, jewellery, small electronics goods – and specialist may take your passport, or credit cards or car keys. But they are unlikely to steal your sofa or your fridge; big and not that valuable at re-sale. Exactly the same applies to data, even in a big-data world. Most data is like your sofa, you’d be lost without it, but it’s unlikely to be of enough value for someone to steal it.
Instead thieves want data like email addresses, names and ID details like National Insurance and Social Security numbers. This high value, detailed data is what needs most careful protection so Protegrity enables organisations to deliver fine-grained protection for each item to ensure a name, an address, or a bank account number is individually protected – lots of locks to protect the data.
We can use locks like encryption, so the output is meaningless code, or we can tokenise, to swap real information for a similar but fake value. The thief thinks it’s a credit card number, because it’s a 16-digit number, with a month/year expiry and a secure code – but this is all fake, cleverly substituted in your database. This means that whatever the nature of a security compromise, the risk to sensitive data is minimised, and Protegrity’s solution is highly performant so when real, authorised users need real data, the tokenised or encrypted values are individually converted, and seamlessly returned for analytics or decision making. Your business can make full use of its data, confident that your customers and your brand are protected.
Find out more about taking a data-centric approach to protecting valuable, private information, in this guide to help data security leaders achieve success in a complex data landscape.

Monday 5 September 2016

496,000 Go Ahead train passengers face delay or cancelled trains every week - a more significant headline

There are few London commuter's who would seek to defend Southern Rail, reputedly one of the worst performing rail operators, but as a data-specialist with a finance background I read the recent news headlines with some despair. "Southern Rail owners reveal £100m profits after months of cancellations" from The Times, being typical of the coverage last week when Go Ahead announced their annual figures.

So being curious to understand the reality, I dug into the numbers a bit, and wanted to share my findings. Go Ahead is a group, and includes includes not just rail franchises, but also bus operations, across the South East and including 24% of London routes. Bus revenue is 26% of Go Ahead's total revenue, but generates 64% of operating profit. By comparison Rail revenue is 74% of the Group revenue, but only contributes 36% of profit. So immediately the £100m headline grabbing figure is off the mark in relevance to the 'months of cancellations' for rail passengers.

The rail element of Go Ahead's business is a 65% share of Govia, which runs three franchises Govia Thameslink Railway [GTR], Southeastern and London Midland.  GTR itself is an accumulation of Southern, ThamesLink, Gatwick Express and Great Northern. Clearly Go Aheads rail interests are extensive, which made me consider whether the profit from Southern Rail was indeed excessive, in comparison to other franchises.

A useful reference for this is a report published by the Office of Rail and Road [ORR] in March this year : Passenger Rail: trends and comparisons (link below). This provides accumulated data across the UK passenger rail network, of revenues and costs across the franchisees. The report is based on data to 2014 - so is not 100% current, but provides history back to 2001 so trends are pretty clear, and the data is more consistent and complete across operators.

First highlight is the overall level of profit: collectively the franchised operators have revenue of £9.4bn, and costs of £9.2bn - a rather thin margin of £200m or 2%.

The report provides a useful chart that shows total revenue and total costs by franchise. The report doesn't provide underlying data, so an element of judgement was required to convert this into a chart that shows the actual profit per franchise; as a £m figure and % of revenue. I ranked this by the size of profit in £m so it shows those generating most profit on the left, reducing those that were loss making on the right. All data is 2014 figures.



This shows that absolute profit ranged from ~£33m (Northern : NOR) down to a loss of £3m (Greater Anglia: GRA). The percentage profit fluctuates, but is broadly in line with absolute profit [the % trend slopes down from left to right] with a few notable exceptions higher than the average: Mersey Rail: MER (14%), C2C (9%), Transpennine Express : TPE (8%), London Overground: LRL (7%). And Southern Rail? Well it's profits (remember for 2014) are in 2nd place at £30m [SOU], but at a margin of 4%, based on a revenue of ~£746m.

The report goes into considerable detail of the factors behind the components that comprise each operators revenue and costs, and it succinctly highlights the complexity of running the UK railways, and also the difficulties in useful comparative measures.

However this doesn't help the frustrated commuter at Waterloo, so I did one additional piece of analysis. What frustrates most passengers is not profit, but the combination of prices and punctuality. Most commuters feel fare rises have been relentless without measurable service improvement. So I had a look at the current data provided by the ORR for current performance.  Taking the latest data from the ORR (year ended 31 March 2016) I've calculated the average passengers per train, for each operator and the average number of trains cancelled or severely delayed each week. From this I derived the number of passengers impacted, and for reference indicated this as a % travelling.


This data represents the operators in a slightly different way (and reflects the changing landscape of franchise operators) but, irrespective of this, the chart is not good for Go Ahead. Govia Thameslink ranks highest, with approx 332,000 passengers impacted by cancelled or severely delayed trains every week. Highest in number and third highest as a percentage (5.3%). Add in the other Go Ahead franchises: South Eastern and London Midland and the total impact is 496, 000 passengers every week.

I'd been tempted to leave it there, but curiosity got the better of me. What was the trend for GTR on the weekly passenger numbers impacted by cancellations and severe delays? The result is below, and it doesn't require much of an explanation, or headline.


Sources:
Go-Ahead Group 2016 Results: http://www.go-ahead.com/en/investors.html
Office of Rail and Road: Passenger trends report: http://orr.gov.uk/publications/reports/passenger-rail-trends-and-comparisons-for-franchised-operators

Thursday 21 January 2016

If you're sitting comfortably, it's time for storytelling with data.

Like many contemporary concepts the world of analytics doesn’t stand still. Organisations are on a constant quest to do more with their data, to get more value, greater insight and understanding and do all this at lowest cost, but incorporating innovative technology.

One concept that gained much traction during 2015, and looks set to really peak in 2016, is in storytelling with data. So, if you’re sitting comfortably, let me explain.

One of the most common challenges for organisations is the difficulty in converting the insight derived from data analytics into something actionable. This encompasses the clear identification and explanation of the ‘so-what?’ element of a piece of data analysis. However brilliant and clever the analytical techniques may be, it is essential to clearly communicate the outcomes to business leaders, so they understand why the findings are of importance, to allow validation of the recommended action, and to ensure the analysis leads to a definitive business decision that impacts the business: typically via decisions that touch individual customers, suppliers, employees etc.

Data storytelling is a technique that is most beneficial when applied to convey what are often complex findings, derived from a multi-step piece of analytics. With a multi-step approach we can take business people on a journey, simplifying complexity, in a way that aligns with their emotional and intellectual awareness and that explains, educates and convinces.

Data storytelling is somewhat different to visualisation and in particular Infographics. Though the two themselves are quite distinct, as this article highlights.

Infographics have become a hugely common and popular approach to summarising statistics and can be found in all kinds of avenues, not just in businesses but also in news and media. There are some good explanations around of why infographics are so popular, and so useful at conveying information. This article does the job particularly well.

A key element of data storytelling is often visual, but it’s more about providing a guided path through findings to show how an analyst has taken some logical steps to arrive at a final result or set of options or outcomes.

It’s not surprising that many software vendors are seizing the momentum around data storytelling. Tableau have added a feature called ‘storypoints’ and Qlik allow a guided story via ‘pathways’.
There is also plenty of quality educational material to encourage good, if not best practice. Tom Davenport’s article in the Harvard Business Review, for example, is an excellent summary of the 10 kinds of stories to tell with data. And a good article in Computer World that identifies the trends in storytelling for 2016.

What I haven’t done here is to delve into detailed illustrations of storytelling in practice; again there’s lots of examples out there. Here are four that highlight a range of approaches:
The FT.com: What’s at stake at the Paris Climate Change Conference.
How far can you travel when your petrol / gas warning light comes on.
Gun Deaths in America: making sense of the numbers.
How sunspots impact global weather.

These examples provide an interesting range of examples and approaches that should provide a clearer guide to the art of data storytelling, but if you want to know more, there is a compilation of the best resources, including links to some excellent blogs.

And if you’re not convinced by the power of storytelling; do you need reminding what time Cinderella had to leave the ball? Or what Jack swapped for the magic beans on his way to the market? Or what animal made Dick Whittington's fortune? Just make sure your data storytelling enlightens and enchants and doesn't make your audience fall asleep!

Friday 23 October 2015

Simplify so my grandmother can understand

It’s always interesting to watch as technologies and business strategies gain momentum and then ebb and flow in popularity. Big Data has been a prime example, with journals, blogs and social media displaying an ongoing variation between hype and stagnation. The result, especially in the intensive, opinion-rich on-line world, is a mix of views from “10 reasons why big data is failing” (e.g. Forbes} to examples portraying “the ten most valuable big data stories” (e.g. Information Age). Extremes always grab attention.

When digging a bit deeper, these conflicting views actually highlight some interesting similarities. Firstly, many of the reasons for failure are not big data or analytics-specific issues, but a reflection of poorer corporate or enterprise strategy and focus, that could relate to any project. I’d include in this factors such as a lack of clear business objectives, not considering an enterprise strategy but instead working in silos, or a lack of communication and misalignment of business and IT objectives. None of this is a big data problem.

An area that gets much attention is the skills needed for big data, with a specific focus on Data Science. This is much more of an analytical issue, but again digging into some depth reveals that the underlying issues are often communication and terminology. With a clear definition of what an organisation is trying to achieve, it becomes easier to understand what skills are needed. From this, like any project, it’s easy to develop a skills matrix and determine what skills are light in the organisation (or missing), and what can be addressed by training, and what by hiring. Organisations that assume the second step in defining a big data project is to hire a data scientist, will be on a futile unicorn quest.

A common theme in the successful criteria is for focus and clarity. A focus on objectives and a clarity of purpose, but without pre-supposing outcomes. This is a tricky balance and requires an open-minded approach – this is where the real skill necessity emerges: 
  • How do you keep an open mind and not set-out with a preconception of what a piece of analytics will discover?
  • How to let the data guide you, to read, analyse and interpret with a flexibility to move in new directions as the story unfolds?  
  • Trying out new ideas, new techniques and incorporating new data : taking unexpected detours on the journey.
  • To read data, but also avoid red-herrings, making sensible, reasoned observations and avoiding traps (correlation vs causation being a prime example).

The most telling way that this approach is successfully achieved is in how the final conclusions are stated. Ideally it will simplify and summarise, explain not just what, but why (and also why-not; to show what was tried and found not to be useful). “Simplify, so my grandmother can understand” is how one CEO put it to me.


As a direct consequence the emerging focus is on clearer communication of findings, and on topics like visualisation and storytelling. Many organisations are achieving much more with analytics and big data; their quest now is to expand the horizon through better communication, to ensure that projects become enterprise initiatives.

Monday 28 September 2015

Summary of Hadoop Use Cases

A summary paper is now available of the recent research project into published use cases of Hadoop adoption. The seven page paper summarises the use cases by identifying the key uses to which Hadoop is being put, the major industries that are being used as case studies by the Hadoop distributors and the key benefits identified by organisations that have implemented Hadoop. The paper is available here and requires no registration.

Friday 18 September 2015

Hadoop use cases: research underway to identify common themes

Within the next week or so I should conclude a piece of research that summarises the published customer case studies for Hadoop adoption. It's been a fascinating project to work on. With all the hype around Big Data, and technologies like Hadoop it's often difficult to get a clear objective view of real usage. The research has examined close to 200 customer case studies to identify common themes in adoption, usage and benefits, the aim being to provide a reference for those looking to adopt Hadoop. The initial findings have shown some interesting insight:

  • whilst much of the focus of Hadoop has been around the perception of it providing a low cost analytical platform, driven by a combination of its open source foundation and use of commodity hardware, this is not the most referenced benefit
  • instead it is scalability that is most commonly quoted, with almost 2/3 [65%] of documented customer stories highlighting this as a key factor in adopting Hadoop. A common driving factor here is that organisations identify a need to retain more history than they could previously handle, or to explore new data sources that had inherently high volumes of data.
  • the next most reported benefit, and directly related to scalability, was speed of analytics (the time taken to run queries), with 57% of descriptive case studies highlighting this advantage
  • the 'cost driver', comes in third place with 39% of customers specifically highlighting the savings in adopting Hadoop  
What's always interesting when looking at factors like scalability of analytics, and speed of queries, is to understand 'in comparison to what'. Many of the use cases undertook comparative benchmarks against other technologies, but many have migrated up from other platforms, most commonly MySQL or SQL server (by number of customers). In these latter cases, the 'faster' argument is always a bit thin; new commodity hardware is always going to be more performant that older solutions. Another reason why it's so difficult to get objective information of user adoption of technologies like Hadoop.

I'll add another blog post here once the research summary is available, with a link for download. Alternatively drop me an email kevin [at] datafit.co.uk and I'll email it when ready (your email address won't get added to a mailing list or distributed further).

Friday 11 September 2015

The return of Arthur Andersen. What’s that about?

An interesting mix of news this week, with an announcement that Arthur Anderson will, according to The Times, ‘rise from the ashes’, in the form of a new French entity; providing a "unique international business services network integrating an authentic inter-professional dimension". Wow! After the dramatic demise of Arthur Anderson through the Enron scandal that's a pretty bold move. Whats even more interesting is that there is now a dispute between this entity, and another in the US, over the use of the Anderson brand. 

Getting attention is an increasing challenge in a busy, complex, dynamic world, doing something counter intuitive is one way. Given it's high profile Apple takes a fairly mainstream approach to get attention. This week, with typical drama, they put on a slick presentation announcing a range of product updates. These included the launch of the iPad Pro, offering 'desktop PC performance' into a bigger format iPad (with a 12" screen) that is 22x faster than the original iPad, and weighs pretty much the same. Further compression of compute power. This has the potential for another step-change in the way people work, creating further enhancements to personal productivity; it was interesting to see Microsoft present on stage at the iPad Pro launch.

Adding more and more power, capability and portability to tools like the iPad further enables consumers and corporations to get wider access to information. The simplicity of tools like the iPad further enhance the spread of information access. Most people have looked in awe as they have seen how easily a toddler picks up the iPad interface, and the puzzlement when 'swiping' a flat-screen TV doesn't have the same effect.

However adding performance and capability and the ability to get easier access to more information can also create complexity, and like many information or analytics problems, trying to get to a detailed, helpful piece of information is an increasing challenge. This is probably why Gartner position 'self-service delivery of analytics' at the top of it's hype cycle for Emerging Technologies. [what Gartner calls the 'peak of inflated expectations']. There are an increasing number of vendors entering the crowded analytics market with claims to offer the new sliced-bread in allowing users to intuitively gain simplified access to information that allows them to make real decisions. There are a few good technologies out there, but there is also a lot of hype and buyers should beware of claims of magic and avoid the snake oil. There is still a massive challenge in tacking large, complex data sets and transforming them into insightful, actionable information. It's very difficult to self-serve or automate things that are so complex. It'll be interesting to see how this domain unfolds over the next few years; there's no doubt that software capability will be dramatically enhanced to make the most of the data.

And whilst you're waiting for self-service analytics on your shiny new iPad Pro, there's a raft of other things you can use it for, just don't impeded clarity of vision for others, by taking selfies next time you're 'experiencing' a music festival or gig.