GLAM/Newsletter/August 2023/Contents/AvoinGLAM report
|
Wikidocumentaries to import images from the web to Structured Data on Commons – a Google Summer of Code project
ByWikidocumentaries is a project by AvoinGLAM that navigates the content of the Wikimedia projects based on the data contained in Wikidata. It explores other open repositories for media and geodata and displays them alongside Wikimedia content. This explorer is envisioned to be used to effortlessly bring more materials to Wikimedia projects while maintaining the best possible source metadata, and eventually, allow the users to reuse the content, for example by creating wikidocumentaries.
This vision is gained bit by bit. Until now, Wikidocumentaries has been read-only.
Google Summer of Code
This summer, we embarked on a Google Summer of Code project with Zexi Gong, a master student in computer science at Northeastern University in San Francisco. She was mentored by Tuukka Hastrup and Susanna Ånäs.
The Wikidocumentaries to import images from the web to Structured Data on Commons project established an image import pipeline where the user can select an image displayed on Wikidocumentaries to be uploaded to Wikimedia Commons with SDC statements. The images originate from external repositories, and they are dynamically displayed in Wikidocumentaries via the open APIs of the repositories. We focused on working with Finna, the Finnish national aggregator, and once this workflow has been established, it can be used to power the import of images from other media repositories with some adjustments.
-
Tuukka Hastrup, Zexi Gong and Susanna Ånäs celebrating the completion of the GSoC period
First step: Authenticating at Wikimedia
To access authentication and to display information about the user status, we added a new pulldown in the main toolbar. The authentication uses the OAuth 2.0 authentication method.
Second step: Uploading media files
The upload component of the project introduced a new actions menu displayed in the corner of each image and in the image viewer. The menu allows the user to choose to upload the image to Wikimedia Commons. The action opens an interactive popup that displays essential image information, including title, description, copyright license, category, creator, and date. The user can associate the name of the creator, originating from the external repository, with a Wikidata item.
Information is prepared for both the Information template and the SDC statements. The integration of this popup interface, along with the conversion of image metadata into Wikitext, was achieved through a combination of user interface design and backend logic.
In the popoup, the user may view and test, and in some cases edit some of the metadata about the image.
- File name is constructed by combining the title of the image, the creator and the year.
- The caption is based on the original title
- The user can identify a matching Wikidata item for the creator using a pulldown, or enter Wikidata to create a new item.
- The creation date uses a normalized year value provided by the Finna API.
- The Commons category is read from the Wikidata statements of the page topic.
- The information page of the image at Finna is displayed and clickable.
- The topic of the page is added as a depicts (P180) statement.
We have left fine tuning the options until later and believe that once the groundwork is done, it will be easy to incrementally enhance the process.
To complete the upload, users can click the "Upload" button within the popup. Behind the scenes, the image information is parsed into wikitext for the Information template. The parsed information is then combined with the uploaded image and submitted to Wikimedia Commons through the API.
If the license information from the source of the image indicates that the license is not in the list of allowed licenses, the upload button will be disabled and show a message that tells the user about the license constrains. We are allowing the display of images that use any Creative Commons licenses. A major reason for this is that many public domain images are incorrectly licensed with restrictive Creative Commons licenses. By displaying them, the users have the ability to report errors.
Third step: Structured Data on Commons (SDC)
The structured data aspect of the project focused on generating meaningful structured data statements for media files uploaded to Wikimedia Commons. These statements would enhance the contextual information available to users exploring the media, and will open up new possibilities for discovery and tools, also in Wikidocumentaries.
When the file is uploaded, it will be placed in the correct Commons category. In addition, a structured data depicts (P180) statement is added, representing the topic of the page where the image was found. At this stage we did not work with the keywords originating from Finna or allow adding more depicts statements in the upload process.
The integration of this functionality involved interaction between the uploaded image's metadata, the selected category, and the Wikimedia Commons' structured data system. This feature enhances the overall quality and accessibility of media files on Wikimedia Commons, aligning with the platform's mission to provide accurate and informative content.
Links to code
https://github.com/Wikidocumentaries/wikidocumentaries-ui/pull/108
https://github.com/Wikidocumentaries/wikidocumentaries-api/pull/31
Future work
With the core aspects of the project now essentially completed, there remain a few openings for questions that will need to be addressed in the future. These potential avenues for further enhancement and refinement could significantly contribute to bolstering the project's overall effectiveness and elevating the user experience.
- OAuth Token Renewal: A potential future improvement would involve addressing the issue of OAuth tokens expiring. Currently, users need to manually refresh their tokens. Implementing an automated token renewal mechanism could streamline the user experience, ensuring uninterrupted authentication for prolonged user sessions.
- Handling Special Characters: The project encountered challenges related to special characters in image filenames. To provide a seamless user experience, it's advisable to implement a mechanism that handles special characters and encodes them appropriately during the upload process. This would prevent potential errors and inconsistencies in image handling.
- Popup Structure Enhancement: One observed issue is that the current popup structure sometimes causes the "upload finish" UI to become invisible, effectively trapping users within the upload process. To rectify this, the popup structure could be refined to ensure a clear and visible path for users to complete their uploads and navigate through different stages without confusion.
- Error Handling and Feedback: Enhancing error messages and providing more comprehensive feedback in cases of server errors or unsuccessful actions could greatly improve user understanding and troubleshooting. Clearer error messages would aid users in identifying and addressing issues during their interactions with the application.
- Continuous Documentation: Maintaining up-to-date and comprehensive documentation is crucial for the project's longevity. Continuously updating the documentation to reflect changes, improvements, and troubleshooting tips would be immensely beneficial for both contributors and users.
Incorporating these future work suggestions would not only enhance the project's functionality but also contribute to an improved user experience and a more seamless workflow for contributors and users alike.
Sequence diagram of the upload process
- Albania report
- Argentina report
- Armenia report
- Belgium report
- Brazil report
- Canada report
- Germany report
- Italy report
- Kosovo report
- New Zealand report
- Poland report
- Serbia report
- Sweden report
- Switzerland report
- UK report
- USA report
- AvoinGLAM report
- Content Partnerships Hub report
- Structured Data on Wikimedia Commons report
- Wiki Loves Living Heritage report
- WREN at Wikimania report
- Calendar