Friday, 5 March 2010

'Big data': big potential, big challenges

This week’s Economist has an excellent special report on managing information entitled ‘Data, Data everywhere’. It looks at the changes, opportunities and challenges posed by our new found ability to create and manipulate vast quantities of data – big data. There are lots of impressive/daunting (depending on your point of view) statistics about just how much data we are now talking about (40 billion photos on Facebook for example) during this “industrial revolution of data”. It also explores the concept of ‘data exhaust’ the trail of clicks which users leave behind them and which Google and others have been able to put to such incredible use: from search to speech recognition and from spell checking to language translation. All made possible not by attempting to training computers the rules which determine how these concepts work, but instead by tracking the activities of billions of user transactions which do the work of refining, correcting and adding relative value to words. Those who have heard my ‘Meet the future of Records Management: Amazon.com’ conference paper will know that I have long suspected that we could and should be making use of this exact same ‘exhaust’ to help us manage information, as well as profit from it - what I describe as 'Automated Records Management' (See also Records Management Journal Vol 19 No.2 2009 for a paper I wrote on this entitled 'Forget Electronic Records Management its Automated Records management that we desperately need')

There’s also interesting stuff in the Economist supplement on the problems of how to make sense of all this data, including new ways of visualising it and the prediction that statistics will soon be one of the coolest jobs around(!). It also makes some interesting points about the need for management to be trained in how to make sense of all this data. This chimes with a conversation I had with a Chief Exec a few weeks ago who also made the case for ensuring that senior management were aware of good old fashioned archival concepts such as provenance and context to give them a better appreciation of what the data they are looking at is actually telling them or how much it can be relied upon (rather than what they wish it was telling them and how much faith they may wish to place in it).

To give the Economist its due it does also look beyond the potential and address some of the challenges (and not just in relation to security – see previous post). Admittedly it does appear a little confused about the subject of data retention stating that ‘current roles on digital records state that data should never be stored for longer than necessary because they might be misused or inadvertently released’. It then goes on to state that ‘in future it is more likely that companies will be required retain all digital files, and ensure their accuracy, rather than to delete them’ – a vision of the future likely to strike fear into every records managers heart. There are some immediate flaws obvious in this logic (in the EU at least) where current data protection laws prevent this in relation to personal data, and elsewhere the Economist itself draws attention to the problems that storing such massive amounts of data is causing to the existing technical and resource infra-structure that Google et al rely on which would seem to favour a more selective approach to data retention on pragmatic grounds if nothing else. But whether such concerns are considered enough to stop the ‘lets keep and exploit everything’ bandwagon which lies behind much of this report is at best debatable and at worst, I suspect, distinctly unlikely.