GLAM/OpenGLAM Benchmark Survey/Questionnaire

From Outreach Wiki
Jump to navigation Jump to search

Note that the webpages related to the OpenGLAM Benchmark Survey are no longer maintained; an archived version is available at Zenodo.

  Main page  









Final version of the questionnaire


The final version of the questionnaire is available for download in various languages in the "Documentation" section of the "Data" page.

Consecutive versions of the questionnaire


The survey questionnaire was developed in three iterations between February and September 2014. A first draft version was submitted to various domain experts for comment. Based on the feedback received a second draft version was produced, which served as the basis for a pre-test in the Netherlands and a focus group discussion in Denmark with members of the target group. Based on the insights gathered through the pre-test and the focus group discussion, the final version of the questionnaire was produced.

You are invited to add your comments to the final version of the questionnaire if you happen to spot any errors or anomalies.

Final version (Update: 24 April 2015)

  • H1: The item "Open data / linked data" was separated into two items "Open data" and "Linked data" in order to facilitate the comparative analysis of the various Internet-related activities.

Final version (Update: 12 March 2015)


Based on the analysis of the data gathered in Poland and Finland, the following new wordings were adopted:

  • C1: "And which percentage of your metadata are you expecting to be available as "open data2 in 5 years from now?" [instead of: "And which percentage of your metadata are you expecting to make available as "open data" in the future (within the next 5 years)"]
  • C2: "And which percentage are you expecting to be available on the Internet in form of linked data in 5 years from now?" [instead of: "And which percentage are you expecting to make available on the Internet in form of linked data in the future? (within the next 5 years)"]
  • D1: "And which percentage are you expecting to be digitized in 5 years from now?" [instead of: "And which percentage are you expecting to be digitized in the future (in 5 years from now)?"
  • D4: "And which percentage are you expecting to be available as “open content” in 5 years from now?" [instead of: "And which percentage are you expecting to make available as "open content" in the future? (within the next 5 years)"]

Reason: A number of respondents reported future values which are quite obviously *relative* values (e.g. “what percentage of your holdings are you expecting to digitize over the next 5 years?”) instead of *absolute* values (e.g. “what percentage of your holdings are you expecting to be digitized in 5 years?”). In order to reduce "noise" in the data, clearer wording has been chosen.

Final version (Update: December 2014)


Based on a preliminary analysis of the data gathered in Poland and Finland until 16 December 2014, the following changes were made to the questionnaire:

  • A8: Added an option "no answer possible".
  • A9: Completed the instruction by adding "Use a dot (".") as a decimal separator if needed."
  • A9: Added an option "No answer possible. - Enter "100" in this field to skip this question.
  • H2: Removed the validation of the e-mail address

These changes will be in effect in all countries except for Finland and Poland.

Final version


Pretest version


Earlier versions


Coordination of questionnaire development


Questionnaire Coordination Meetings

  • Questionnaire Coordination Meeting (17 March 2014): Minutes
  • Questionnaire Coordination Meeting (28 April 2014): Minutes
  • Questionnaire Coordination Meeting (7 May 2014): Minutes
  • Questionnaire Coordination Meeting (22 September 2014): Minutes

Process / Milestones

  • Finalization of Draft 1, by 25 March 2014 (coordination: Beat Estermann)
  • Broad consultation (GLAM-Wiki, OpenGLAM mailing lists, researchers who have published in the field, etc.), from 25 March until 21 April 2014 (coordination: Beat Estermann)
  • Elaboration of Draft 2 based on feedback, by the beginning of May (coordination: Beat Estermann)
  • Implementation of Draft 2 in the survey tool, in the first half of May (Joris Pekel)
  • Internal review of the implementation in the survey tool, in May (coordination: Beat Estermann)
  • Pretest, starting in May (coordination: Joris Pekel)
  • Final version of the questionnaire in English, by the end of September (coordination: Beat Estermann)
  • Translation of the questionnaire, starting from October (coordination: Beat Estermann)

Aspects to be taken into account in the elaboration of the questionnaire (Proposal)


Definition of core concepts


Cultural heritage institutions


For the purpose of the ENUMERATE project, the cultural heritage domain has been defined to consist of the "memory institutions":

  • Museums
  • Libraries
  • Archives and records offices
  • Audio-visual and film archives
  • Organisations with curatorial care for monuments, sites and the historic environment
  • Hybrid types of organisations

The criterion is that "curatorial care for, at least part of, the collections of the institution are included in its mission. Institutions that do not hold heritage collections or that have collections of heritage materials (like for example of books, films, and music) to be lent by or sold to contemporary users without the explicit task of safeguarding the collections for future generations, will not be included in the survey. This essentially leaves out both school libraries [...] and public libraries without cultural heritage collections."[1]

Web 2.0 / the social web




The term 'crowdsourcing' was coined by Jeff Howe in 2006 in Wired Magazine, combining the two terms 'crowd' and 'outsourcing': "Simply defined, crowdsourcing represents the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call. This can take the form of peer-production (when the job is performed collaboratively), but is also often undertaken by sole individuals. The crucial prerequisite is the use of the open call format and the large network of potential laborers.”[2] The term has since been used with somewhat varying definitions; Estelles-Arolas and Gonzalez-Ladron-de-Guevara have compared forty original definitions of crowdsourcing in order to propose a comprehensive definition: “Crowdsourcing is a type of participative online activity in which an individual, an institution, a non-profit organization, or company proposes to a group of individuals of varying knowledge, heterogeneity, and number, via a flexible open call, the voluntary undertaking of a task.”[3]

Open data


Linked open data / semantic web


Insights from the Swiss pilot survey


At the kick-off meeting of the international coordination team it was decided to develop the survey questionnaire on the basis of the questionnaire of the Swiss pilot survey:

State-of-the-art in the core fields of investigation


Heritage institutions and digitization


What are the key insights from the ENUMERATE project?

Heritage institutions and the web 2.0


Smith-Yoshimura and Shein examined a sample of 24 websites from the cultural heritage domain which engage their communities and seek user contributions by providing social media features, such as tagging, comments, reviews, images, videos, ratings, recommendations, lists, or links to related articles. They found that within their sample of 24 websites, 16 used crowdsourcing for data enhancement in the form of improving description, 11 for collection and content building, and 10 for data enhancement in the form of improving subject access. Further areas of crowdsourcing were: ratings and reviews (i.e. for collecting subjective opinions), promoting activities outside of the site, sharing and facilitating research, as well as networking and community building. [4]

In addition to social media functionalities built into institutions’ websites, Smith-Yoshimura and Shein also investigated the use heritage institutions make of third party social media sites, such as LibraryThing, Flickr, Youtube, Facebook, Wikipedia, and blogs. Based on comparative case descriptions, they reached the following conclusion:

"LibraryThing is an excellent resource from which to harvest user-generated metadata on published works and disseminate information on one's own holdings of published materials, but impractical for unique or unpublished works. Flickr is an unparalleled vehicle for sharing still images and gathering user-generated description of the images. YouTube is the leading site for promoting and sharing moving images. Facebook provides an avenue through which LAMs [i.e. Libraries, Archives, and Museums] can communicate textually and imbed audio, video, and images. Twitter is an efficient way to push out short textual messages, such as announcements and alerts. Wikipedia offers the potential to reach a broad audience and direct web traffic to a LAM and its select resources. Blogs, especially those built in-house, are perhaps the most adaptable platform for communicating various formats of information through an interface that can be functionally and visually tailored to suit institutional needs. Establishing a presence on social networking sites, wikis and blogs enables LAMs to bring their resources to online environments where users are already active, exposing content to new audiences, encouraging user interaction, and fostering a sense of community." ([4], p. 64).

Regarding the numbers of heritage institutions using third party social media sites, they report that 1600 libraries worldwide used LibraryThing to harvest user-generated content and to enhance the descriptions of published works in their online public access catalogues. For the other types of social media services, they report findings from a survey carried out among special collections and archives in academic and research libraries in the United States and Canada. According to that study, 49% of the 169 responding institutions indicated that they were using institutional blogs, 39% had a social networking presence, 37% reported adding links to Wikipedia, 30% used Flickr, roughly one quarter used Twitter (25%), YouTube (24%) or Podcasting (24%), 17% had an institutional wiki, 15% collected user-contributed feedback (e.g. through social tagging), and 10% used mobile applications to reach out to their audiences. Responding institutions were also asked which of these services they were planning to implement within a year. Here, institutional blogs rated highest with 19%, followed by user-contributed feedback (16%).[5] Regarding the publication of heritage content on Wikipedia, the first core survey of the ENUMERATE project revealed that among the 774 responding European heritage institutions, on average 3% of their digital collections are accessible through Wikipedia [6].

Heritage institutions and crowdsourcing


There are plenty of examples of crowdsourcing approaches in the cultural heritage sector. Several authors have created inventories of crowdsourcing projects throughout the world.[7][8][9][4][10][11] Based on these inventories, typologies have been developed: Oomen and Aroyo propose a classification scheme based on the digital content life cycle model of the National Library of New Zealand, distinguishing the following types of crowdsourcing approaches: correction, classification, contextualisation, co-curation, complementing collections, and crowdfunding.[9] They also point to the fact that crowdsourcing initiatives in the cultural heritage domain may be executed without institutions being in the lead. They expect that more and more crossovers will take place between community- and organization-driven projects, as is the case with co-operations between heritage institutions and the Wikipedia community. This observation matches the insights gathered by Terras who investigated amateur online museums, archives, and collections and concluded that the best examples of these endeavors can teach best practice to traditional heritage institutions in how to make their collections useful and to engage a broader user community.[8] She not only recommends that heritage institutions increasingly use web 2.0 services such as Flickr, Twitter, and Facebook to build an online audience, but also encourages them to bridge the gap between pro-amateurs with their private collections of ephemera, and institutional collections.

As noted above, Smith-Yoshimura and Shein, investigating social metadata approaches, developed a typology of crowdsourcing approaches that is slightly different from the one proposed by Oomen and Aroyo, and applied it to a sample of 24 websites from the cultural heritage domain which engage their communities and seek user contributions by providing social media features, such as tagging, comments, reviews, images, videos, ratings, recommendations, lists, or links to related articles. They found that within their sample of 24 websites, 16 used crowdsourcing for data enhancement in the form of improving description, 11 for collection and content building, and 10 for data enhancement in the form of improving subject access. Further areas of crowdsourcing were: ratings and reviews (i.e. for collecting subjective opinions), promoting activities outside of the site, sharing and facilitating research, as well as networking and community building.[4]

There seems to be a conceptual overlap between the use of social media by heritage institutions and crowdsourcing. in this context, Holley insists on the difference between social engagement (e.g. social tagging) and crowdsourcing, arguing that crowdsourcing usually entails a greater level of effort, time and intellectual input from an individual.[7] According to her, crowdsourcing relies on sustained input from a group of people working towards a common goal, whereas social engagement may be transitory, sporadic or done just once. As a consequence, setting up a crowdsourcing project is about “using social engagement techniques to help a group of people achieve a shared, usually significant, and large goal by working collaboratively together as a group” [7]. She argues that libraries are already proficient in social engagement with individuals, as many forms of social engagement in libraries pre-date the advent of the Internet, but that they are not necessarily proficient yet at defining and working towards group goals. Oomen and Aroyo point to motivating users for participation and supporting quality contributions as major challenges of crowdsourcing.[9]

There is hardly any research into heritage institutions’ motivations for crowdsourcing. In an attempt to fill that gap, Alam and Campbell carried out a case study to investigate organizational motivations for crowdsourcing by the National Library of Australia. They found that the institution was motivated by a set of attributes that dynamically changed throughout the implementation of the crowdsourcing project, ranging from resource constraints to utilizing external expertise through to social engagement. The researchers noted that the project resulted in a high level of social engagement, active collaborations with and between stakeholders, and development of ‘bridging’ social capital that in turn instigated further motivations for the organization. They concluded that this dynamic change of organizational motivation may well be crucial for the long-term establishment of crowdsourcing practices.[12]

Heritage institutions and open data / open content


Balthussen et al. describe the approach several heritage organizations had been pursuing since 2011 in the Netherlands in order to create an open data ecosystem in the cultural heritage sector.[13] Based on two expert workshops with cultural institutions they identified the main benefits and risks of opening up cultural data. They found that the number one concern among cultural heritage professionals was that opening up collections would result in material being spread and reused without proper attribution to the institution. Related to this was a perceived loss of control over the collections. Concerning financial aspects, the workshop participants did not fear a direct loss of income by making data openly available, but were afraid that they may fail to generate extra income in the future as third parties develop new business models based on their datasets. Related to the perceived loss of attribution and control was also a perceived loss of brand value. Finally, concerns regarding privacy violations were an issue for organizations that hold data containing personal information. Overall, the workshop participants agreed that open data should be part of an institution’s public mission, especially if it received public funding. In their view, making collections widely accessible was at the heart of the majority of cultural heritage institutions. Furthermore, the cultural heritage professionals expected to be able to enrich data through aggregators like Europeana or other parties and to link their open data to that of other, related collections. Being able to increase the amount of channels by which end users can be reached was also seen as an important benefit of open data. As a consequence, the workshop participants also expected benefits in terms of better discoverability, which drives users to the provider’s website. Further perceived benefits were increased relevance of institutions and the possibility of attracting and interacting with new customers.

These findings partly reflect earlier findings by Eschenfelder and Caswell who surveyed 234 “innovative” cultural heritage institutions in the United States in order to tackle the question in which cases cultural institutions ought to control reuse of digital cultural materials.[14] The main motives mentioned by archives, museums, and libraries for controlling the access to their collections were: (i) avoiding misuse or misrepresentation, (ii) ensuring proper object description and repository identification, (iii) avoiding legal risk, as well as (iv) donor or owner requirements. Among the top five reasons why they would limit the access to their collections, archives also mentioned generating income, libraries the impossibility to obtain the necessary rights, and museums their unwillingness to give up control over information about endangered or valuable objects, animals, or cultural events/items. The main motives against controlling the access, and thus in favor of opening up their collections, were (i) the belief that open collections have greater impact, (ii) concerns about legal complexity when access had to be regulated for various user groups, and (iii) the institutional mission, policies or statutory requirements.

Some of the legal concerns are likely to be absent in the case of public domain works. Kelly examined the policies on image rights at eleven art museums in the United States and the United Kingdom, when the underlying works are in the public domain.[15] Investigating how and why they had arrived at their approach and what key changes resulted from the policy, she found that providing open access was a mission-driven decision, but that different museums looked at open access in different ways. For some it was primarily a philosophical decision, while for others it was also a business decision. For most museums, developing and adopting an open access policy was an iterative and collaborative process, with many stakeholders working together to come up with an appropriate approach. Staff at many museums cited the following critical factors that favoured the adoption of an open access policy: diminishing revenues, difficulties when it came to drawing a line between scholarly and commercial uses of their images, senior management support for an open access policy, as well as technical innovations that enabled images to be made accessible with greater ease. In the process, they had to overcome a series of concerns, such as fears regarding the consequences of loss of control, challenges regarding metadata quality, technical challenges when it came to providing access to the collections through the museum’s website, as well as a possible loss of revenue. Most museums reported positive outcomes of opening up their collections. Staff mentioned the goodwill and recognition that came with open access, as well as a sense of satisfaction at helping to fulfil the institution’s mission. Virtually all museums experienced increased website traffic, and in some cases, curators received better and more interesting inquiries from scholars and the public. There were also positive side effects in that the policy change forced the institutions to think through their policies and their implications, as well as in form of improved technology skills among the staff members. Some museums mentioned downsides of the open access approach: For museums without automated delivery systems, increasing numbers of image requests had led to an increase in workload. Thus, an increased demand may result in a need for investments in the technical infrastructure. Unsurprisingly, most museums in the survey reported stable or lower revenue from rights and reproductions. And finally, some museum staff mentioned that it had become more difficult to track the use of images or objects in their collections.

It has to be noted that many of the cases cited by Kelly relate to museums that did not comply with the Sunlight Foundation’s Open Data Principles, but pursued open access approaches that were limited to educational or scholarly use only, even for works that were in the public domain. In the case of US institutions claiming copyright over faithful reproductions of two-dimensional works, such approaches most likely amount to copyright overreaching [16]. Copyright overreaching occurs when claims of copyright protection are made that overreach the bounds of justifiable legal rights. Examining policies from U.S. museums, Crews found four varieties of copyright overreaching: assertions of false copyrights; claims to copyrights not held by the museum; assertion of control beyond rights of copyright; and claims of quasi-moral rights. He identified four motivations for copyright overreaching: protecting the integrity of art, generating additional revenue, getting credit for the museum’s collections and other good work, as well as adherence to donor requirements.[16]

Heritage institutions and linked open data


What is the state-of-the-art regarding heritage institutions and linked open data?

Assessment and benchmarking tools


Various assessment and benchmarking tools related to open data and community engagement have been developed, such as:

Provide a synthesis of the dimensions tested for with these tools; put them in relation with the state-of-the-art described above.

What other assessment or benchmarking tools could be relevant for the survey?

Reference surveys in the area of cultural heritage


Aligning the questions regarding the characteristics of institutions to those of important international reference surveys allows for easier comparison of results. Surveys to be taken into account:

What other surveys should be taken into account?

Reports in the area of digital cultural heritage


Terminology references / glossaries


The ENUMERATE project has identified a set of harmonization tools concerning terminology, digitisation costs, collection types, and web-statistics. They are available in form of an online document; links newer additions are available on delicious.

Analytical frameworks


In the case of the Swiss pilot survey, the innovation diffusion model, popularized by Everett E. Rogers, was used to illustrate where institutions stand with regard to the various trends, such as digitization, open data, and crowdsourcing. However, this model was applied to the data ex-post, i.e. during questionnaire development it hadn't been clear that the diffusion model was going to be used to present the results. Therefore, the survey questions used were not necessarily 100% adequate to accommodate for the model. It is suggested that the new questionnaire is designed in a way to facilitates the presentation of results in terms of the innovation diffusion model.

Innovation diffusion model

  • What aspects of the diffusion model need to be taken into account?
  • Throughout previous research, various stage models of the innovation-decision process have been used. - Which one is the most appropriate for our purposes?

Research questions


Based on an analysis of the pilot survey, it has been suggested that the international survey should allow to [17]:

  • Make comparisons between museums, archives, libraries: Where do practices converge between the different types of heritage institutions? Where do they diverge?
  • Investigate the factors that influence the adoption of open data policies and crowdsourcing practices; taking also into account practices in the area of web 2.0, as well as the latest insights derived from qualitative research (e.g. regarding the self-conception of heritage institutions and their role; driving and hindering factors; perceived risks; etc) and insights derived from research regarding digitization in the heritage sector.
  • Further investigate the links between open data and crowdsourcing practices.
  • Investigate the change of perceptions as the institutions implement open data policies or crowdsourcing approaches, e.g. by looking at institutions that are already further advanced in the adoption process.
  • Make international comparisons in order to reach a better understanding of differences across countries, for example in relation to the implementation of the EU Directive on the Re-Use of Public Sector Information in the cultural heritage sector, but also with regard to financial considerations or possible differences regarding the diffusion process.
  • Further corroborate findings implied by innovation diffusion theory in order to inform practice.

See: Draft Analysis Plan (based on the final version of the questionnaire)


  1. G.J. Nauta, S. Bakker, and M. de Niet, ENUMERATE Core Survey 1 Methodology, November 2011.
  2. J. Howe, Crowdsourcing: A Definition. Crowdsourcing Blog, June 2, 2006.
  3. E. Estelles-Arolas and F. Gonzalez-Ladron-de-Guevara. Towards an integrated crowdsourcing definition. Journal of Information Science, 38(2), pp. 189-200, 2012.
  4. 4.0 4.1 4.2 4.3 K. Smith-Yoshimura, and C. Schein, Social Metadata for Libraries, Archives and Museums Part 1: Site Reviews. Dublin, Ohio: OCLC Research, 2011.
  5. J. M. Dooley and K. Luce, Taking our pulse: The OCLC Research survey of special collections and archives. Dublin, Ohio: OCLC, 2010.
  6. N. Stroeker and R. Vogels, Survey Report on Digitisation in European Cultural Heritage Institutions 2012. ENUMERATE Thematic Network, May 2012.
  7. 7.0 7.1 7.2 R. Holley, Crowdsourcing: how and why should libraries do it? D-Lib Magazine, 16(3/4), 2010.
  8. 8.0 8.1 M. Terras, Digital curiosities: resource creation via amateur digitization. Literary and Linguistic Computing, 25(4), pp. 425-438, 2010.
  9. 9.0 9.1 9.2 J. Oomen and L. Aroyo, Crowdsourcing in the Cultural Heritage Domain: Opportunities and Challenges. C&T' 11, QUT, Brisbane, Australia, 20 June - 2 July 2011.
  10. A. Nicholls, M. Pereira, and M. Sani (eds.), Report 1 - The Virtual Museum. The Learning Museum Network Project, 2012.
  11. L. Carletti, G. Giannachi, D. Price, and D. McAuley, Digital Humanities and Crowdsourcing: An Exploration. Paper presented at Museums and the Web 2013, Portland OR, USA.
  12. S. Alam and J. Campbell, Dynamic Changes in Organizational Motivations to Crowdsourcing for GLAMs. Thirty Fourth International Conference on Information Systems, Milan 2013.
  13. L. B. Baltussen, M. Brinkerink, M. Zeinstra, J. Oomen, and N. Timmermans, Open Culture Data: Opening GLAM Data Bottom-up. Paper presented at Museums and the Web, 2013, Portland OR, USA.
  14. K. R. Eschenfelder and M. Caswell, Digital cultural collections in an age of reuse and remixes. Proceedings of the American Society for Information Science and Technology, 47(1), pp. 1-10, 2010.
  15. K. Kelly, Images of Works of Art in Museum Collections: The Experience of Open Access A Study of 11 Museums, Washington DC: Council on Library and Information Resources, 2013.
  16. 16.0 16.1 K. D. Crews, Museum Policies and Art Images: Conflicting Objectives and Copyright Overreaching. Fordham Intellectual Property, Media & Entertainment Law Journal, Vol. 22, pp. 795-834, 2012.
  17. B. Estermann, Diffusion of Open Data and Crowdsourcing among Heritage Institutions: Results of a Pilot Survey in Switzerland. Journal of Theoretical and Applied Electronic Commerce Research, Vol 9, Issue 3, September 2014, pp. 15-31.