Josef Stalin is popularly supposed to have said “Quantity has a quality all its own,” in reference to the idea that what the Soviet Union lacked in military technology it more than made up in sheer quantity of raw manpower. We can make the same comment about the huge quantity of crowd-sourced data available today. Individually, each datum is not particularly significant. Pool them together, however, and the right tools can extract valuable information from them.
In Technology: A Binary Goldmine, Richard Waters of the Financial Times looks at the growing use of vast quantities of public data to inform business decisions. There are clearly many ethical concerns that will have to be worked out; who wants their Saturday night tweets to have an influence on the price they pay for car insurance, or a Facebook photo posted by a friend to impact their credit rating?
Companies can look for clusters of behaviour that provide an indication of how people with particular publicly-visible attributes might behave.
Analysing the detailed financial behaviour of very large groups of customers over a protracted period, for instance, could give banks a clue as to which are most likely to default next, says Mr Olson at Cloudera.
The article quotes Tim O’Reilly, the popular technology commentator and publisher, saying “the big skill in future will be to ask the right question”.
He’s right. We software engineers don’t always know what question we need to answer (though we might have strong views about questions that we shouldn’t answer), but what we do have is the computational and statistical tools to crunch these vast pools of data to generate powerful predictive analytics.