GLAM/Newsletter/December 2021/Contents/New Zealand report
The New Zealand Wikidata thesis project
While I was a Wikipedian in Residence at Lincoln University in December 2019, University Librarian Deborah Fitchett drew my attention to the public DSpace database Research@Lincoln, the University's open-access institutional repository of postgraduate theses (dissertations). Deborah raised the possibility of uploading that entire database to Wikidata, to enable easier discovery and citation in Wikipedia using the CiteQ template.
In 2021 she invited me, Siobhan Leachman (User:Ambrosia10) and Tamsin Braisher (User:DrThneed) to discuss the project further. Deborah had the opportunity to present this proposal to her university library colleagues at an online conference, and we quickly realised that if we could convince other libraries to join us in a data upload, it would be a world-first: a national university Wikimedia collaboration which would set the standard for thesis metadata in Wikidata, and would add tens of thousands of new items. The four of us prepared a slide deck to outline the value of the project and convince the other New Zealand university repositories to export a CSV of their thesis metadata for upload. Siobhan prepared a Cradle schema to map the database fields into Wikidata, and Tamsin tested the workflow with a small dataset of theses.
Deborah prepared a follow-up one-page document for librarians to share with their managers to encourage them to join the initiative. Her arguments in summary were:
If managers want to know what's in it for the library/uni: citations. Sources cited in Wikipedia get more views and citations. If managers are concerned about staff time: by leveraging our Wikidata experts, the time for each institution is minimised. If managers are concerned about data: this is all public data anyway but if libraries are involved then you can make sure the data meets your own standards and any data you don't want public is excluded.
After discussion, most of the university librarians decided to participate in the project, even those who had expressed reservations about the amount of time and effort it might require. A Hackathon with representatives of each university library was scheduled for 6 December to sort out which data fields and formats to export from their repository.
In preparation for this meeting the three Wikipedians developed a set of recommendations, based on what metadata would be need for CiteQ references to display correctly in Wikipedia, and compared with four standard citation formats commonly used in academia. While there were numerous thesis properties that could be modelled in Wikidata, many of them were deemed optional as long as this core goal was met. The presentation at WikidataCon 2021 by Jeannette Ho (Texas A&M University Libraries), Enhancing Discovery for Dissertations, proved helpful (although the video was not made available until long after WikidataCon, so only a brave New Zealand Wikimedian who stayed up until the small hours was able to see it). This process revealed the kind of data cleanup that would need to be done on export or in OpenRefine, and incidentally flagged some problems with CiteQ (it currently doesn't list the institution, the type of thesis, or the repository name). Tamsin and Siobhan presented the recommendations document at the hackathon, which was amended by the participants during discussions of how to handle the upload process.
The result was a detailed list of instructions sent out by Deborah to all the participating institutions to walk them through the data export process. The next step is to collate the CSV files, and for the Wikimedians to run a second hackathon where we clean up and upload the data to Wikidata.
Many libraries use controlled vocabularies such as MeSH and ANZSRC to add keywords for dissertations. Tamsin is coordinating a small group of editors to complete the mapping of the 2008 and 2020 ANZSRC controlled vocabularies to Wikidata so that they can more easily be used to add "main subject" statements to New Zealand and Australian dissertations in Wikidata. Adding "main subject" statements will make dissertations on Wikidata more accessible.
A full CC0 description of the New Zealand Wikidata Thesis project can be read here.
The relationship between Cite Q and theses is discussed at en:Template talk:Cite Q#Theses and other "types". Cite Q is written in Lua, and contributions to address the points raised will be welcome. Andy Mabbett (Pigsonthewing); Talk to Andy 14:20, 12 January 2022 (UTC)