GLAM/Resources/Data and media partnerships workflow

GLAM

Galleries • Libraries • Archives • Museums

Get Started • Model Projects and Case Studies • Evaluating Projects • Contact Us
For the GLAM-Wiki Community: Connect • Discussion • Calendar • Newsletter • Resources • Volunteers • Other pages

Step in workflow	💡 Tips (some things to think about during this phase)	🛠 Tools (selection of software that can be used in this phase)
	Negotiations between a GLAM partner and Wikimedia community members Both sides can get to know each other by starting with smaller activities (e.g. an edit-a-thon or internal Wikimedia course). Agreements about the co-operation can be made explicit in a Memorandum of Understanding. (Guide on how to create a MoU)
	Data and media files are made available for Wikimedia Commons and/or Wikidata.	Website scraping/ingest tools (if the data is available online but the partner can't produce data exports from its database) Tabula - open source tool to extract tables from PDF files PAWS - Python programming notebook environment on Wikimedia Tools Lab that can transfer records from an institution's API
PRE UPLOAD [edit]
	Make sure that copyright of the data and media files is compatible with Wikimedia projects. Data for Wikidata must be CC0. (See Wikidata and copyright) Media files' copyright must be compatible with Commons policy. (See Commons:Licensing for comprehensive information, and this infographic for a brief overview of how it works)	If permissions and licenses for copyrighted media files aren't published in a public place: make sure the permissions are clarified via an e-mail to OTRS, the platform used by the Wikimedia projects to manage and archive e-mail conversations. (Licensing images: when do I contact OTRS?)
	Clean up the data to be consistent and compatible with Wikimedia Commons and/or Wikidata. Look at similar media or data items on Wikimedia Commons or Wikidata for inspiration how to model the data. Wikidata's WikiProjects – the 'groups' where volunteers work together on common interests – often have recommendations on data modelling for specific subjects.	Spreadsheet software - allows non-programmers to run checks against existing Wikimedia content Google Sheets - free spreadsheet software that can be collaborative Wikipedia and Wikidata tools for Google Spreadsheets is a free add-on for Google Sheets that adds functions for querying Wikipedia and Wikidata. OpenRefine (formerly Google Refine) - popular tool for advanced data cleaning, transformation and matching against Wikidata content. Its homepage includes video tutorials and a guide on how to use version 3.0 and higher for Wikidata manipulation and uploading. PAWS and Pywikibot - for those with some programming experience allows for large scale querying and advanced actions.
	Always check which data and media items are already present on Wikidata and Wikimedia Commons. Volunteers have often already autonomously uploaded quite a few images from GLAM collections. Wikidata will probably already contain quite a few data items about creative works, people and topics related to specific GLAM collections. On Wikimedia Commons, it is considered good practice to upload new (higher-quality) media files. Don't overwrite existing files. On Wikidata, duplicate items must be avoided and merged when they are discovered. It is OK (and even highly recommended) to add extra sources and statements to existing items though.	Search function of Commons and Wikidata Wikidata Query Engine PetScan, an advanced search and query tool for Wikimedia projects
	Reconciliation is the step where data items from a source dataset are matched with their corresponding Wikidata items. Be thorough during this step. Creating many duplicate Wikidata items must be avoided, as these cause a lot of cleanup work for the Wikidata community!	OpenRefine and its Wikidata reconciliation feature Mix'n'match, an online tool for crowdsourced reconciliation (Manual)
UPLOAD [edit]
	Upload the new data items and/or media files to Wikidata and/or Commons. Start with small test batches to check for structural errors. Upload in manageable batches. Don't make your batches too large (hundreds rather than thousands) – correcting mistakes in thousands of data items or files at once is not fun. Occasionally check uploads during the process, to prevent errors from propagating.	Wikimedia Commons: Upload Wizard for simple uploads of up to 50 files. Offers no options for refined metadata. Pattypan, a user-friendly batch upload tool that works with spreadsheets and that allows for refined details in metadata. GLAMwiki Toolset, an advanced upload tool for XML feeds of large file batches. Requires days of lead time and a request for permission to use the tool. Wikidata: QuickStatements, create or update Wikidata items using tab-delimited or CSV files OpenRefine (3.0+) tool that has powerful upload functionality for Wikidata For both: Bots / scripts operated by Wikimedia community members. (Wikidata bot requests) (Wikimedia Commons bot requests)
POST UPLOAD [edit]
	Fix mistakes and omissions that were made during the upload. Mistakes happen! Take responsibility for them, and make sure to correct and improve your own uploads.	Wikimedia Commons: Cat-a-lot, a gadget on Wikimedia Commons to help with categorizing images by pointing and clicking. Activate in your Commons user preferences. VisualFileChange.js, a gadget on Wikimedia Commons that allows you to do batch edits to groups of media files AutoWikiBrowser, a semi-automated editor Wikidata: QuickStatements, create or update Wikidata items using tab-delimited or CSV files OpenRefine (3.0+) tool that has powerful upload functionality for Wikidata EditGroups allows to 'undo' faulty batch edits that were performed with QuickStatements and with OpenRefine PetScan, the advanced search and query tool for Wikimedia projects, also has (limited) editing functionalities for Wikidata items.
	Work with Wikimedia communities to enhance and enrich the data and media. Improvements can include: More precise metadata (e.g. what are the places, objects, people depicted in a media file?) More references Translations of metadata	Template:Wikidata list + Listeria (for making structured lists of Wikidata items on Wikipedia or other Wikimedia projects)
	Encourage use of the media and data in Wikimedia projects and beyond. Campaigns can help a lot: Wikipedia article writing contests, photography events... Think beyond Wikipedia; perhaps the media or data can be re-used on other platforms too.
IMPACT [edit]
	Evaluate the impact of the media files and/or data by measuring improvements and (re-)use Measurable aspects may include (Number of) people who worked on the data and media Types of enrichment Inclusions in Wikimedia project Pageviews of pages where data/media is used	Wikimedia Commons: GLAMorous shows how often media files from a Commons category (or uploaded by Commons user) are used in other Wikimedia projects BaGLAMa shows Wikimedia page views over time, for specific categories of media files on Wikimedia Commons. Get in touch with its maintainer, Magnus Manske, who can add your own category/ies. GLAMorgan shows Wikimedia page views for a specific Wikimedia Commons category for a specific month. Fae's GLAM Dashboard, a set of templates that show interesting data about a Commons category, including the most edited files and the most active volunteers who have contributed to them. GLAMorousToHTML is a set of Python scripts that create datestamped HTML reports and corresponding Excel files listing all Wikipedia articles (in all languages) in which (one or more) images from a given category tree on Wikimedia Commons are used. It converts the output of the GLAMorous tool to HTML and Excel files. Example output. Wikidata: SPARQL Recent Changes, shows changes to items from a Wikidata query over a given period of time.

PRE UPLOAD

UPLOAD

POST UPLOAD

IMPACT