Guest blog: Aggregation & the Culture Grid

 

PrintThis is a guest post by Collections Trust CEO Nick Poole following a recent email to the Museums Computer Group list. For further information about the Collections Trust, see http://www.collectionstrust.org.uk

Last week, I sent a message out via the Museums Computer Group email list announcing some changes to the Culture Grid, the aggregation platform run by the Collections Trust. Broadly the changes are that:

  • The Culture Grid closed to ‘new accessions’ (ie. new collections of metadata) on the 30th April
  • The existing index and API will continue to operate in order to ensure legacy support
  • Museums, galleries, libraries and archives wishing to contribute material to Europeana can still do so via the ‘dark aggregator’, which the Collections Trust will continue to fund
  • Interested parties are invited to investigate using the Europeana Connection Kit to automate the batch-submission of records into Europeana

Background to the Culture Grid

The Culture Grid (www.culturegrid.org.uk) has its origins in a much earlier project called the Peoples Network. In addition to putting internet-connected terminals in public libraries, the Peoples Network wanted to ensure that users could discover and use the collections from UK museums, archives and libraries.

Using some of the original technology and content from the Peoples Network Discover Service, we created the Culture Grid with the aim of opening up digital collections for discovery and use by aggregating them and sharing the resulting data with anyone who wanted to make use of it.

Over the ensuing 7 years, the Culture Grid has grown to contain some 3m records from around 200 museums. It is regularly harvested by Europeana, and as a result is a major source of English-language content.

Challenges of aggregating museum data

Throughout its history, the Culture Grid has been tough going. Looking back over the past 7 years, I think there are 3 primary and connected reasons for this:

  • The value proposition for aggregation doesn’t stack up in terms that appeal to museums, libraries and archives. The investment of time and effort required to participate in platforms like the Culture Grid isn’t matched by an equal return on that investment in terms of profile, audience, visits or political benefit. Why would you spend 4 days tidying up your collections information so that you can give it to someone else to put on their website? Where’s the kudos, increased visitor numbers or financial return?
  • Museum data (and to a lesser extent library and archive data) is non-standard, largely unstructured and dependent on complex relations. In the 7 years of running the Culture Grid, we have yet to find a single museum whose data conforms to its own published standard, with the result that every single data source has required a minimum of 3-5 days and frequently much longer to prepare for aggregation. This has been particularly salutary in that it comes after 17 years of the SPECTRUM standard providing, in theory at least, a rich common data standard for museums;
  • Metadata is incidental. After many years of pump-priming applications which seek to make use of museum metadata it is increasingly clear that metadata is the salt and pepper on the table, not the main meal. It serves a variety of use cases, but none of them is ‘proper’ as a cultural experience in its own right. The most ‘real’ value proposition for metadata is in powering additional services like related search & context-rich browsing.

The first of these two issues represent a fundamental challenge for anyone aiming to promote aggregation. Countering them requires a huge upfront investment in user support and promotion, quality control, training and standards development.

The 3rd is the killer though – countering these investment challenges would be possible if doing so were to lead directly to rich end-user experiences. But they don’t. Instead, you have to spend a huge amount of time, effort and money to deliver something which the vast majority of users essentially regard as background texture.

Culture_Grid_Infographic

Fig 1. Infographic showing the core elements of the Culture Grid

People, strategy & money before technology

The discussion on the MCG list since I sent my message has focused on technology, but the problem of aggregation is not primarily technological. People have quite rightly pointed out that the most sustainable technical option is to support museums in publishing better-structured data at source – on their web pages or Collections Management Systems.

Richer data at source remains a cherished dream – it is the basis of the COPE (Create Once, Publish Everywhere) strategy we have worked on for the past 3 years, it is the dream of the CIDOC Conceptual Reference Model and a potential use case for schema.org. It is, I believe, at the heart of Richard Light’s vision of truly linked, truly open data from museums.

But until we as a community are able to think of ways of overcoming the strategic, financial, operational and human barriers to aggregation, none of these technical possibilities is likely to become a reality.

Why would anyone do this?

It is certainly true that any service or platform which positions itself between the museum and the end-user needs to be crystal clear about how and where it is adding value.

The Collections Trust is the professional association for Collections Management. The 3 pillars of our work are ‘Standards’, ‘Workforce Development’ and ‘Advocacy’. Our primary goal in taking on the Culture Grid has been to use aggregation as a strategic incentive to improve the adoption of SPECTRUM as a data standard in museums.

In practice, while museum data is undoubtedly better-structured than it was 10 years ago, platforms like the Culture Grid and Europeana (in the UK at least – this is not true in other countries) have not yet proved a sufficient incentive to drive improvement in data quality and the use of standards by museums. The most significant driver of improvement in standards adoption for the current generation has been the ongoing investment of the Collections Management System vendors in improving their software.

Sharing the workload

In practice, then, the underlying reason for our change in strategy is not so much financial, but more because aggregation is not delivering sufficient value for end-users, for museums, for the Collections Trust or for our funders. Put simply, if aggregation were delivering value, it would have been much easier to find money for it.

If we are really going to unlock the true potential of cultural heritage metadata aggregation, no single organisation (with the possible exception of Google) can afford to shoulder the entire burden of paying for promotion, hosting, quality-control, user support, syndication, partnership development etc etc.

From our perspective, the most viable long-term strategy involves a combination of:

  • Ongoing long-term support and training for museum professionals in improving the creation, management, structuring, licensing and use of data about their collections as a core competence of having a collection and managing it properly;
  • Peer-to-peer support through the museum community so that people without the technical skills to assess and refine their data model, select appropriate licenses and prepare their metadata for harvesting and aggregation are supported by people who can;
  • Ongoing support from software vendors alongside the long-term process of improving the capabilities of their products;
  • A programme of opportunistic projects and developments aimed at bulking up the value proposition (for example through partnerships with initiatives such as the BBC Digital Public Space, Public Catalogue Foundation or the Wikimedia Foundation).

Starting the conversation

Ultimately, though, the momentum of aggregating and sharing metadata from and about collections will only be continued if there is sufficient energy and will to do it. This announcement is essentially the Collections Trust’s way of saying that we cannot keep driving for aggregation on our own, with no financial support but more importantly with little enthusiasm from our industry.

We have been able to move the agenda of aggregation forward a little, but if it is genuinely going to become a useful and sustainable part of the digital landscape for museums in the UK, we think the baton needs to be taken up and championed by the community.

I mentioned in my blog post that we would like to start a conversation with the Museums Computer Group. At the heart of this conversation is a question: “is aggregation useful to you, and if it is, are you willing to share the effort of making it work?” I look forward to having a robust discussion around this in the coming weeks!

Thanks and references

I would like to take this opportunity to thank all of the people who have worked so hard in support of the Culture Grid and opening up collections metadata, including our colleagues at Knowledge Integration, the Collections Trust team, past and present, our funders including Arts Council England and all of the professional colleagues we work with in museums.

For additional technical information on the Europeana Connection Kit, talk to your Collections Management System vendor, or see the specs at http://www.europeana-inside.eu/documents/eck/index.html.

A guide to publishing your data through the ECK is available from http://www.europeana-inside.eu/news/publication_eck_guide__sharing_collections_with_the_easy_connection_kit.html

 

Posted in Article, Uncategorized
5 comments on “Guest blog: Aggregation & the Culture Grid
  1. mia says:

    Thanks, Nick. I look forward to seeing what others have to say – I’m particularly hoping we hear from collections staff as much (if not more) than technologists.

    I wonder if one factor is that so few museums re-use data from their collections management systems for other purposes, whether public-facing websites or information exchange with other institutions. If museums don’t already find their data valuable enough to re-use, perhaps they don’t see why anyone else would value it?

    Cheers, Mia

  2. mia says:

    Also, I’ve just noticed a quirk of some recent work on a responsive theme – the comment submit button is hiding in the top right-hand corner of the browser window. #awkward

  3. Hi Nick,

    I feel like there is so much one could say in response to your email, but to keep it (reasonably) short:

    “The value proposition for aggregation doesn’t stack up in terms that appeal to museums, libraries and archives”

    Whilst I think there is always more we can do to prove the value of an aggregator, I think for the Archives Hub, as an aggregator of archival descriptions, our community are happy and willing to contribute descriptions and we continue to attract new contributors. I believe many archivists are convinced by the value of the Archives Hub, and the need for researchers to be able to intellectually bring together source materials for their research.

    “In the 7 years of running the Culture Grid, we have yet to find a single museum whose data conforms to its own published standard, with the result that every single data source has required a minimum of 3-5 days and frequently much longer to prepare for aggregation.”

    That sound familiar! It is the same for the Hub, and probably our biggest challenge. I believe many people think that bringing together archival descriptions is fairly straightforward because they are all doing the same thing and based on recognised standards; in practice it is not easy because being ‘based on’ standards allows a great deal of flexibility and standards are often open to interpretation. However, we are finding ways to facilitate data processing, partly through providing our own data creation service (which is very popular) and partly through working on data processing ‘pipelines’ for different contributors. One thing that has struck me is that whilst the inconsistencies in cataloguing are frustrating, archivists often seem very willing to listen and respond to the advise that we provide around potential changes they can make that will help with consistency and online discoverability.

    “Metadata is incidental”

    I’m not sure I go along with your thinking here. In our experience researchers don’t see the descriptive information as “background texture”. Sure, it isn’t the main meal; but if you can’t find the plate in the first place, you are going to go hungry. From our user surveys, a reasonable proportion of researchers do visit an archive as a result of finding material on the Archives Hub. I agree that it is a challenge to convince some people (funders) that aggregated data is an attractive prospect because it isn’t all shiny and dynamic and so forth. But we consistently find researchers telling us that the Archives Hub, as an aggregator, saves them time in finding sources for their research. For some of our contributors, the Hub is their only online presence, or it is the only one that exposes their descriptions through search engines.

    I believe aggregation must be delivering value if use of the website remains high (and growing); if use of collections increases; if we can do valuable things with the aggregated data that could not be done otherwise; and if we save researchers time in finding sources. Actually finding ways to measure the value of an online service is not easy though – page views and visitor numbers are a poor indication, although they must play a part.

    I feel that there *is* the energy and will in much of the archival community, and we are also working with archival management systems to facilitate interoperability, as well as providing the Hub data to others (e.g. APE, potentially Europeana, SNAC and hopefully Cendari). Maybe the museum community has a somewhat different take on things, due to the different nature of museum collections, and sharing data is less of a priority.

    cheers,
    Jane.

  4. Matt L says:

    It’s enlightening, and worrying, to hear that museums might only see their metadata as an opportunity to tweak discovery within their own collections, or (even worse) that they don’t really see value in its public re-use and re-combination at all.
    I’m a PhD student in art history whose advisor also happens to be a museum curator — an unusual, though not-unheard-of combination. (And all this is prefaced by the caveat that my own priorities lie in art museums, and I’m quite aware that they don’t represent every institution out there!) The conversation in academic art history circles around the use of computational research methods with historical datasets has blossomed in the past few years (http://blogs.getty.edu/iris/beyond-digitization-new-possibilities-in-digital-art-history/). As with the big tent of the so-called "Digital Humanities", a great deal of this conversation has to do with academic art historians getting more comfortable with integrating digitally-aided research, teaching, and publishing into their workflows. A smaller contingent—and this is where I locate myself—are interested in quantitative approaches to art history, and this is where interoperable collections data become indispensable.
    My own research mines the repositories of Dutch and Flemish prints in the British Museum and Rijksmuseum, among others, to power social network analyses of printmakers’ and publishers’ changing partnership patterns. At the risk of tooting my own horn, the findings are totally novel to our field, and they’re possible only because these institutions took a chance that outside parties might find better uses for their metadata than they could think up on their own.
    As rich as these two collections are in the period I’m trying to study, it’s a death sentence to rely on just one institution’s data when trying to make broader historical claims, hence my efforts to replicate my analyses across both collections. Though both the BM and the RKM have released their rich collections metadata, they use entirely different schemae, vocabularies, and authorities, laying the onus of data integration on me.
    Early on, I looked to Europeana’s aggregated LOD, which promised access to a much wider range of European collections and, with that, better support for my historical claims. It was useless. All the detailed information in the BM and RKM databases regarding how actors were linked to an object (designer, engraver, publisher, depicted in, etc.) were elided by the broad and simplistic aggregation framework offered by Europeana. I don’t want to discount the utility offered by that model—after all, I’m approaching museum data from a rather rarefied viewpoint. But data-driven researchers will be demanding increased quality of collection metadata. I’d rather see efforts focused on gradually bringing more institutions into the fold of of CIDOC-CRM, with a thorough usage of extant resources like the Getty vocabularies, than catering to the lowest common denominator in an effort to get the widest possible assortment of institutions within one dataset.

  5. Joris Pekel says:

    Hi Nick,

    Thanks for your thoughtful blog. Really helpful to understand more about the situation Culture Grid is currently in (although I still don’t fully understand, but Gordon will drop by to tell us more on Friday).

    I personally think the questions you raise are very relevant, not only for Culture Grid, but for Europeana as a whole and all the aggregators in our Network. 7 years ago we started working on aggregating data to a centralised repository and we are sort of still doing it exactly like that. In these 7 years the world has moved on and expects a whole range of new things when it comes to accessing online data. I think you know as the previous chair of the Europeana network as no other that the expectations from both the users, the institutions and the funders have radically changed. First it was all about metadata, but we quickly found that we needed the content as well, and we need it preferably in some very high quality.

    I think the good part about aggregation is – it works. With Culture Grid and the dozens of other aggregators around Europe we managed to mobilise over 3,300 institutions to make their data available to the web. Now I am sure that a lot of them had done so otherwise but making so many people think about what it means to publish their data to the web has been one of the greatest (although hidden) achievements of the work we have been doing. The fact that waiving away your rights using the CC0 waiver is now quite an accepted thing would not have happened if we not first got people involved.

    That said, yes aggregating hundreds of datasets to a centralised repository has loads of issues. The process is intransparent, chaotic and higher quality data gets lost by standardisation. It is also currently a lot of work for the cultural institutions and it is sometimes difficult to show the direct value of it (a different question is if institutions should think in terms of ‘traffic’ and other market terms but that is another discussion).

    For me personally national aggregators and Europeana are useful to connect datasets with each other and be able to generate new contexts and stories. Is harvesting everything to a centralised place the best way to do that? I am not sure. In my (very not technical) mind I could also see a situation where we not have one repository, but thousands that are really well connected, kind of like the internet. If smaller institutions are not able to connect they can still get their data aggregated in order to connect them as well. We are just simply not there yet.

    At Europeana we have dedicated 2015 to think a lot more about this and a first document has already been published, also with the aggregator survey which is much in line with what you are saying. See: http://pro.europeana.eu/files/Europeana_Professional/Projects/Project_list/Europeana_Version3/Deliverables/EV3%20D1_1%20Aggregation%20Infrastructure.pdf

    You are definitely not the first person raising your concerns and I would love to have an honest discussion with both the aggregators as well about the institutions what they see as the ‘value of aggregation’. I guess it starts with the question, why are you publishing data in the first place? For who? If you want more people to see your greatest pieces, maybe uploading them to Flickr is a better idea. If you want to give researchers a vast corpus of data to work with, aggregation might be for you.

    I was not a member of the MCN list but just signed up. So looking forward to see what appears there. I will also raise this wider within Europeana (although I guess most of you have read your blogpost already :) )

    Cheers, and even though you are not at CT anymore, hope to speak soon,

    Joris

1 Pings/Trackbacks for "Guest blog: Aggregation & the Culture Grid"
  1. […] Poole about the retiring of Culture Grid in the UK. Don’t use you’ll lose […]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

@ukmcg on Twitter
@ukmcg

on Twitter


The MCG is for technologists, curators, educators, marketers and more in and around museums. Posts and event news by Mia (Chair), Jess, and the Comms team