GLAM/Newsletter/August 2023/Contents/AvoinGLAM report

This Month in GLAM – Volume XIII, Issue VIII, August 2023

AvoinGLAM report

Wikidocumentaries to import images from the web to Structured Data on Commons – a Google Summer of Code project

Google Summer of Code

This summer, we embarked on a Google Summer of Code project with Zexi Gong, a master student in computer science at Northeastern University in San Francisco. She was mentored by Tuukka Hastrup and Susanna Ånäs.

The Wikidocumentaries to import images from the web to Structured Data on Commons project established an image import pipeline where the user can select an image displayed on Wikidocumentaries to be uploaded to Wikimedia Commons with SDC statements. The images originate from external repositories, and they are dynamically displayed in Wikidocumentaries via the open APIs of the repositories. We focused on working with Finna, the Finnish national aggregator, and once this workflow has been established, it can be used to power the import of images from other media repositories with some adjustments.

Tuukka Hastrup, Zexi Gong and Susanna Ånäs celebrating the completion of the GSoC period

First step: Authenticating at Wikimedia

To access authentication and to display information about the user status, we added a new pulldown in the main toolbar. The authentication uses the OAuth 2.0 authentication method.

	Clicking on the user icon triggers a dropdown menu that provides options for logging in or logging out, along with displaying the username if the user is logged in.
	Upon clicking the login option, users are redirected to the Wikimedia Commons authentication page. After successful login, users are automatically redirected back to the previous page in Wikidocumentaries.

Second step: Uploading media files

The upload component of the project introduced a new actions menu displayed in the corner of each image and in the image viewer. The menu allows the user to choose to upload the image to Wikimedia Commons. The action opens an interactive popup that displays essential image information, including title, description, copyright license, category, creator, and date. The user can associate the name of the creator, originating from the external repository, with a Wikidata item.

Information is prepared for both the Information template and the SDC statements. The integration of this popup interface, along with the conversion of image metadata into Wikitext, was achieved through a combination of user interface design and backend logic.

In the popoup, the user may view and test, and in some cases edit some of the metadata about the image.

File name is constructed by combining the title of the image, the creator and the year.
The caption is based on the original title
The user can identify a matching Wikidata item for the creator using a pulldown, or enter Wikidata to create a new item.
The creation date uses a normalized year value provided by the Finna API.
The Commons category is read from the Wikidata statements of the page topic.
The information page of the image at Finna is displayed and clickable.
The topic of the page is added as a depicts (P180) statement.

We have left fine tuning the options until later and believe that once the groundwork is done, it will be easy to incrementally enhance the process.

To complete the upload, users can click the "Upload" button within the popup. Behind the scenes, the image information is parsed into wikitext for the Information template. The parsed information is then combined with the uploaded image and submitted to Wikimedia Commons through the API.

If the license information from the source of the image indicates that the license is not in the list of allowed licenses, the upload button will be disabled and show a message that tells the user about the license constrains. We are allowing the display of images that use any Creative Commons licenses. A major reason for this is that many public domain images are incorrectly licensed with restrictive Creative Commons licenses. By displaying them, the users have the ability to report errors.

The upload component encountered challenges due to outdated and oversimplified examples in the upload API documentation. The discrepancy between the documentation and the actual process complicated the implementation, and we had to invest significant effort into structuring the API requests correctly. Additionally, the error messages received during the upload process lacked specificity, causing us to struggle with pinpointing and resolving issues. For example, the permission denial error appeared frequently without clear indications of which permissions were lacking. Developers have to test many times or communicate with the API developers in order to solve this error.

Third step: Structured Data on Commons (SDC)

The structured data aspect of the project focused on generating meaningful structured data statements for media files uploaded to Wikimedia Commons. These statements would enhance the contextual information available to users exploring the media, and will open up new possibilities for discovery and tools, also in Wikidocumentaries.

When the file is uploaded, it will be placed in the correct Commons category. In addition, a structured data depicts (P180) statement is added, representing the topic of the page where the image was found. At this stage we did not work with the keywords originating from Finna or allow adding more depicts statements in the upload process.

The integration of this functionality involved interaction between the uploaded image's metadata, the selected category, and the Wikimedia Commons' structured data system. This feature enhances the overall quality and accessibility of media files on Wikimedia Commons, aligning with the platform's mission to provide accurate and informative content.

Links to code

https://github.com/Wikidocumentaries/wikidocumentaries-ui/pull/108

https://github.com/Wikidocumentaries/wikidocumentaries-api/pull/31

Future work

With the core aspects of the project now essentially completed, there remain a few openings for questions that will need to be addressed in the future. These potential avenues for further enhancement and refinement could significantly contribute to bolstering the project's overall effectiveness and elevating the user experience.

OAuth Token Renewal: A potential future improvement would involve addressing the issue of OAuth tokens expiring. Currently, users need to manually refresh their tokens. Implementing an automated token renewal mechanism could streamline the user experience, ensuring uninterrupted authentication for prolonged user sessions.

Handling Special Characters: The project encountered challenges related to special characters in image filenames. To provide a seamless user experience, it's advisable to implement a mechanism that handles special characters and encodes them appropriately during the upload process. This would prevent potential errors and inconsistencies in image handling.
Popup Structure Enhancement: One observed issue is that the current popup structure sometimes causes the "upload finish" UI to become invisible, effectively trapping users within the upload process. To rectify this, the popup structure could be refined to ensure a clear and visible path for users to complete their uploads and navigate through different stages without confusion.
Error Handling and Feedback: Enhancing error messages and providing more comprehensive feedback in cases of server errors or unsuccessful actions could greatly improve user understanding and troubleshooting. Clearer error messages would aid users in identifying and addressing issues during their interactions with the application.
Continuous Documentation: Maintaining up-to-date and comprehensive documentation is crucial for the project's longevity. Continuously updating the documentation to reflect changes, improvements, and troubleshooting tips would be immensely beneficial for both contributors and users.

Incorporating these future work suggestions would not only enhance the project's functionality but also contribute to an improved user experience and a more seamless workflow for contributors and users alike.