Data is overrated.
"Data is the new oil", the catch phrase goes. But even more important than collecting and cleansing data, is finding ways of putting them to use. Without the invention of the combustion engine, synthetic fibre, plastic, ... oil would be hardly as valuable as it is to us. Yet, when talking to many managers over the last fifteen years or so, the major concern seemed and still seems to be about the availability and the quality of data, less about making use of them. "If we only had the data Google, Amazon, Facebook, you-name-it have", "We're not even able to identify our customers uniquely", "We should add geo-location data, social milieu data, you-name-it in order to improve the quality" ... is what you hear all over the shop.
At best, this is a self-delusion. Worse, it is an excuse for inaction. More often than not -- we'll come to that -- we have all data we need or could easily find them. But as we don't have well developed ideas what we would do with them, we neither can go out to find them. Nor do we create value. Instead we talk about fancy refinements before even having basic benefits. Greetings from Pareto!
Before even asking about availability and quality, we should have an idea of how to make something out of data. What problem do we want to solve? What value could we create? ... if only we had that data. What, for instance, would you do if you had your sales figures in real time instead of on a monthly basis? Btw., just showing off in your next meeting doesn't count as value creation. What would you do if you had some Amazon-like recommendation engine? Or is there a whole new idea building on data?
When thinking about this post at yesterday's morning run, the following example came to my mind. A stupid one but I am a train commuter after all? -- What if I'd liked to have real-time information helping me invest in Deutsche Bahn stock? Figures about punctuality, for example. Or about utilisation?
Punctuality is an easy one. Just spider the respective online information and there you go. One or two days of work for a savvy programmer. The biggest issue being to find all the relevant portals. Or at least most of them. But what about utilisation? Well, that information is probably not immediately available publicly. Here comes the idea: measure the acceleration out of stations. Newton's law says: the higher the mass, the lower the acceleration (given a certain force). Thus, the higher the utilisation, the lower the acceleration. Newton's law won't be fully working as there are a lot of other factors, but the general correlation would be quite robust, I guess. You also won't get exact numbers this way, but more a development over time. But that should exactly serve your purpose.
Remains the only problem: where to get acceleration data from (you need positions and times)? Frankly, I don't know. I could imagine that you could tap into some train-spotting data. Or some GPS information, use webcams ... . And if all of this doesn't work? -- Well, you could measure it on your own. A few light barriers along the most important train routes should do.
Well, the idea might not be perfect (yet, if you make a fortune out of it, save a tip for me) but the point is another one: as soon as you know what you want to do, getting the data will be possible (we'll be talking about margins of error in a minute).
More wanted: turning ideas into action!
Even more neglected than focussing on questions of how to turn data into valuable information in the first place, it seems to me are answer to the question: how to operationalise that information!
Fifteen years ago, I was part of a BI initiative. We built a sensational 1TB data warehouse (that's about the size of my private cloud storage space today) and calculated propensities to buy and to churn. The quality was great (despite having only a few data and AI being SciFi back then). We could make very good forecasts.
Especially the propensities to churn would be a true treasure trove. Keeping a customer is way more effective than winning a new one. But: the effort was a total failure! We couldn't get the salesforce to act upon that information. First, they felt too busy with other stuff. Second, they were incentivised for new business, not for retaining customers on the brink of churning. And third, we had these mythical stories: "My mother in law was on the churn list. She would never churn!"
By the way, what was the management reflex? -- We should improve the quality of our analysis, for example by including external data. -- From 85% to what? -- Well, the initiative is still in hibernation.
If you had these real-time sales figures mentioned above would you just use them for another, more frequent report? Or would (and could) you act upon them? -- For instance by adjusting prices, products, incentives? If you would have that magic recommendation engine, what would you do with it? -- And not only in virtuality but in your bricks and mortar businesses, too.
Fuzziness is not the problem
There are questions, where your data should be exact. Financial reporting, for instance. But talking about Big Data, Business Intelligence etc., the question is often very different. Forecasting train utilization based on acceleration data is all but exact science but nor is the stock market. If you get the trend before anybody else, you could make money. With a recommendation engine, hardly anything can go wrong. If you recommend something "wrongly", so what. Better than not recommending anything. If you wrongly send your salesperson to a person supposed to be a potential churner, what goes wrong? It improves the relationship anyway ... also if it is (besides being an outlier) your mother in law.
In our own -- human -- decisions we hardly complain about fuzziness. As soon as computers are involved, we claim 100% or nothing.
The nature of probabilistic calculations is their probabilistic nature.
Action is more often than not more important than data perfection!