Posts Tagged ‘Statistical Analysis’

The Multi-Dimensional Beast

I want to say “multi-dimensional engineering”, but since you had a problem with “sensors” I won’t go there.

- The Doctor, attempting to explain the TARDIS to a 17th-century pirate

In a previous article I looked briefly at some of the mathematical techniques used in counterterrorism and analysis of social networks. Cluster detection techniques, for example, can expose groups of people within these networks. However, many of these methods find it hard to produce good quality results from high-dimensional data.

When faced with the challenge of multi-dimensional data, it’s generally considered unacceptable (for software engineers, at least!) to say that we “won’t go there”. So what can we do to make such data easier to handle?

(more…)

COMMENTS 0

Of Soviets and Statisticians

Josef Stalin is popularly supposed to have said “Quantity has a quality all its own,” in reference to the idea that what the Soviet Union lacked in military technology it more than made up in sheer quantity of raw manpower. We can make the same comment about the huge quantity of crowd-sourced data available today. Individually, each datum is not particularly significant. Pool them together, however, and the right tools can extract valuable information from them.

In Technology: A Binary Goldmine, Richard Waters of the Financial Times looks at the growing use of vast quantities of public data to inform business decisions. There are clearly many ethical concerns that will have to be worked out; who wants their Saturday night tweets to have an influence on the price they pay for car insurance, or a Facebook photo posted by a friend to impact their credit rating?

Companies can look for clusters of behaviour that provide an indication of how people with particular publicly-visible attributes might behave.

Analysing the detailed financial behaviour of very large groups of customers over a protracted period, for instance, could give banks a clue as to which are most likely to default next, says Mr Olson at Cloudera.

The article quotes Tim O’Reilly, the popular technology commentator and publisher, saying “the big skill in future will be to ask the right question”.

He’s right. We software engineers don’t always know what question we need to answer (though we might have strong views about questions that we shouldn’t answer), but what we do have is the computational and statistical tools to crunch these vast pools of data to generate powerful predictive analytics.

COMMENTS 0

Cutting With the Grain

There’s not much obvious similarity between software engineering, statistics and carpentry, but the one thing they have in common is that it’s much nicer to work with tools that cut with the grain rather than against it.

Crunching a lot of data to extract statistical information can be very hard without the right tools. You could, in principle, use any programming language to perform statistical analysis, but if you choose the wrong one it’ll feel like using a screwdriver to hammer in a nail; sure, it’ll work, and if it’s all you have to hand then you have to make do. But it’s much better (and more comfortable!) to use a hammer.

R has been described as “a programming language written by statisticians for statisticians.” It is an open-source product that can be used as an interactive command-line driven tool for manipulating data and viewing the results live, allowing fast production of useful results, or iteration of concepts when trying to develop a new model. R can also be used to create stored programs that can run regularly or continually on huge quantities of data. (more…)

COMMENTS 0