Very interesting to see that the folks in charge of the UK Web Archive at the British Library are planning to adopt a crowd-sourcing approach to informing their selection of websites to archive
When I wrote Managing the Crowd back in 2008 I put forward the view that agreeing appraisal decisions largely on the basis of user opinion of the worth of a record was likely to be the only way to go. Here we not only see an example of this being attempted in practice, but an interesting explanation for the decision:
“we recognise that this manual selection process can sometimes be time consuming for frequent selectors. It’s also inevitably subjective, reflecting the interests of a relatively small number of selectors.”
This reflects two of the major tensions informing my own thinking on this back then (and now). That is that the kind of manual approaches to selection and appraisal traditionally adopted, ie manual processes undertaken by a relatively small number of trained professionals simply isn’t sustainable in the face of the ever growing onslaught of information being created. This will come as no surprise as an issue to any information professional even if solutions to it – this initiative aside - still seem rather thinner on the ground…
The admission of subjectivity within existing approaches to appraisal and the implication of ‘selection bias’ that underlies it is less widely discussed, perhaps because until now we’ve had no alternative. It stands to reason that despite our best efforts any selective appraisal process must inevitably be biased in some way, whether the appraiser is conscious of it or not. But having appraisal decisions based at least in part on user behaviour promises to go along way to resolving some of these issues. What would of course be fascinating would be a comparative study which compares the websites which would have been selected for capture by existing manual methods with those that are captured by the Web Archive according to ‘the crowd’ to go some way to seeing how closely (or not) the two are aligned.
A counter argument to all this may run along the lines of ‘but what if the sites the public are viewing most are not the most important ones.’ In short: ‘do the public really know best?’ Perhaps wisely the British Library are also incorporating “curatorial input to this approach, so we’ll be asking curators from the Library to assess the quality and relevance of resulting selections”. But it does pose and interesting question: should we be seeking to capture as accurately as possible the sites which the public believed to be of interest/use to a particular topic or those that we as information professionals believe they should have been interested in? The former of these may lead to the capture of some surprising, perhaps even ‘unsuitable’ sites, whilst the latter would perhaps provide a more informed, maybe a more ‘official’ version of events. But which would be the most accurate?
Its also interesting to note that this approach to crowd-sourcing isn’t just relying on user opinion but on the results of actual user behaviour. They aren’t just asking people to collectively vote for sites they wish to see included in the archive, but are analysing data from twitter regarding which sites were linked to at the time. Using user behaviour to inform appraisal wasn’t something I considered back in 2008 but have done quite of thinking about since, notably in a paper in the Records Management Journal in 2009 (Vol 19 No.2) titled ‘Forget electronic records management, its automated records management that we desperately need’. That is that we use the data about user behaviour generated by business systems (which records they have opened, whether they edited or just read it, what they looked at next etc) as the means to help inform our records management policies based not on what we believe to be organisational need, but on actual patterns of behaviour. This is something we are all familiar with through sites such as Amazon and their ‘users who looked at this item also looked at these items…’ functionality. Use of such ‘behavioural analytics’ is also gathering momentum within academia with institutions for example using library usage patterns to identify at an early stage which students may be disengaging from their studies. To my mind the ability to closely monitor and analyse user behaviour in this way has the potential to not only increase the scalability of much of records management but also to increase the level of sophistication in which it can operate.
Maybe there is hope yet.