GLAM/Newsletter/October 2021/Contents/Structured Data on Wikimedia Commons report

This Month in GLAM – Volume XI, Issue X, October 2021

Structured Data on Wikimedia Commons report

A Wikimedia Commons Reconciliation Service, You Say?

Why Wikimedia Commons reconciliation? How does it work?

A Wikimedia Commons reconciliation service is necessary groundwork to allow further editing of (structured data of) Wikimedia Commons files in OpenRefine. How does this work?

The reconciliation service takes a list of file names on Wikimedia Commons that are entered in a column in OpenRefine. It then looks up the M-ids (identifiers) for these files. This process is called reconciliation.
The magic happens in the next step, though... after reconciliation, the user can proceed to retrieve wikitext and existing structured data statements from these Commons files. As requested, the wikitext and the structured data for each file will be listed in consecutive (new) columns in OpenRefine. This process is called data extension.
As a result, the user will be able to take this wikitext and existing structured data, modify and clean it further in OpenRefine, and convert wikitext to structured data (for instance: convert strings of names of photographers to their corresponding Wikidata items, and add these as creators (P170) to the files' structured data. This step is currently not yet possible; the OpenRefine team will work on this during the upcoming months.

The reconciliation service is not written specifically for OpenRefine alone; it will also be usable in other tools that want to take existing information (Wikitext and structured data) from Wikimedia Commons files and further process this information.