Wednesday, 25 August 2010

Is the Cloud aware that it has 'the future of digital archiving in its hands'?

As anyone in the audience at the ECA conference in Geneva earlier this year will be aware, one of issues which I’ve been mulling over in recent months relates to roles and responsibilities in ‘the cloud’. The question I was asked to address in Switzerland was ‘in whose hands does the future of digital preservation lie?’ and my succinct response was: ‘Google's’. This was (for reasons evident in the paper I gave) meant both literally – given their increasing dominance of the cloud space but also metaphorically, as an encapsulation of all cloud service providers.

And certainly when my colleague, Doug Belshaw, pointed me in the direction of this post regarding Facebook’s archiving policy it became clear that I’m not the only one thinking about the (unintended?) consequences for all parties of where this might lead us.

Its tempting to see things only from our (by that I mean the archival community) side of the fence – to lament the inevitable decline in our future professional role that the handing over of content to commercial external service providers for its long term preservation will entail and to worry about what it may mean for the archives (and their users) of the future.

But maybe we should also pause to reflect on what it may mean for these service providers themselves and whether they actually have as much concern about the implications of this new found responsibility on their side as we do on ours.

For as I concluded my paper in Geneva:

"Perhaps we should actually stop to ask Google and their peers whether they are indeed aware of the fact that the future of digital preservation lies in their hands and the responsibilities which comes with it and whether this is a role they are happy to fulfil. For perhaps just as we are in danger of sleepwalking our way into a situation where we have let this responsibility slip through our fingers, so they might be equally guilty of unwittingly finding it has landed in theirs.

If so, might this provide the opportunity for dialogue between the archival professions and cloud based service providers and in doing so, the opportunity for us to influence (and perhaps even still directly manage) the preservation of digital archives long into the future".

To again quote from the conclusion of my paper:

"Maybe the interconnection of content creation and use and its long term preservation need not be as indivisible within the cloud as it might first appear. Yes Google’s appetite for content might appear insatiable, but that does not necessarily mean that they wish to hold it all themselves – after all, their core business of search does not require them to hold themselves every web page they index, merely to have the means to crawl it and to return the results to the user. Might we be able to persuade them that the same logic should also apply to the contents of Google Apps, Blogger, YouTube and the like? If so, might the door be open for us, the archival community through the publicly funded purse to create and maintain our own meta-repository within which online content can be transferred, or just copied, for controlled, managed long term storage whilst continuing to provide access to it to the services and companies from which it originated?

That way they get to continue to accrue the benefit of allowing their users to access and manipulate digital content in ways which benefit their bottom line, the user continues to enjoy the services they have grown accustomed to and the archival community can sleep soundly, safe in the knowledge that whilst service providers are free to do what they want with live content, its long term preservation and safety continues to lie in our own experienced and trusted hands".

I wonder if such dialogue is already occurring between Google, Facebook et al and the likes of NARA, NAA and TNA. Lets hope so…


Maureen said...

Hi Steve,

I guess it all depends on what we want to achieve from using cloud based services, though I'm not sure we know yet for the longer term! I posted something about this on my blog a while back -
Sorry I missed your presentation in Geneva! Did you get many quetsions afterwards?


Steve Bailey said...

Hi Maureen,

You make some good points. I think you are right about not knowing yet what we want from the Cloud, but also suspect that we will see (are already seeing?) usage creep which will extend the range and importance of material that we entrust to the cloud over time - more than likely without pausing to think of some of these 'hidden' issues.

Chris Prom said...


I really enjoyed your talk in Geneva and think you are track with a major issue here. But I think that, even if NARA, etc are addressing this issue with Google, Facebook, etc--which I doubt--the vast majority of really interesting content will still not be preserved by any kind of trusted third party broker, simply because large national archives have no overriding interest in dealing with personal records, records of associations, private businesses, universities, etc. For that reason, I think most of the action needs to take place in helping 'smaller' repositories get their act together. If you are interested, I posted some prelimiary thoughts about that at my blog, riffing on your general theme.

Chris Prom

Marieke Guy said...

Hi Steve,

Excellent talk and post. I've posted a summary of your ideas and a few thoughts of my own on my Beginner's Guide to Digital Preservation blog.

I conclude with:

"So it seems that the future of digital archiving continues to lie in the hands of those who care about it – the records managers, the archivists, the librarians, the JISC project managers – it is just that they now need to either include others in the dialogue about how to preserve digital objects or (and a part of me thinks this is the more realistic approach) think in a more lateral way about how you continue to preserve when you’ve lost control of your digital objects."


Steve Bailey said...

Hi Marieke,

Good comments and post!

The differences between mass storage and an archive which both you and the JISCmail commentator refer to raise some interesting (and potentially far-reaching) questions.

As an archivist by training I instinctively agree with comments such as "The cloud may be a mass storage device but it is not yet an archive" and "archives as opposed to mass storage has to work by what it refuses as much as what it includes" but also increasingly wonder whether the assumptions on which these statements are based still stand?

After all, i wonder how many of these seemingly inviolable truths by which we describe the function and qualities of an 'archive' were originally just born of practical necessities - necessities which technology may have now rendered largely obsolete?

In the past archives had to be selective, simply because a limited capacity to process, store and provide access to their contents required that this be so. But ask any researcher whether they would like more information about their chosen subject or less and the chances are they will plump for the former. And how many new historical discoveries owe their origin to the serendipitous discovery of a comparitively trivial record which owes its survival to chance - chances which must be increased dramatically if we retain and provide easy access to a greater percentage of the written record.

Of course questions of trust, context, provenance and original order are still relevant and should still help provide a division between mass storage and archival storage, but selectivity I'm now less sure about as a criteria...

I guess i should also - for the avoidance of any doubt - end by repeating that whilst I observe these trends happening, this does not mean that i endorse them. I'm sure that most service providers in the cloud do not see themselves as offering archive facilities and would not claim to be 'trusted repositories' in the way we would judge our established notion of what an archive is. My point is that regardless we are walking into a position where they are being asked to fulfil this role regardless, simply because the archival community is not.