GLAM/Newsletter/January 2024/Contents/Portugal report
|
Unveiling Catrapilha: A ReacTive approach to GLAM content import
ByEmbarking on the journey of file uploads to Commons some 15 years ago was nothing short of an adventure. Yet, amidst the excitement, I was soon faced with the pressing issue of the glaring absence of adequate tools tailored for seamless mass uploads. Whether sourcing content from platforms offering free licenses or delving into the realms of the public domain treasures, whether in Glam projects I helped or coordinated, the goal was twofold: to ensure maximum efficiency and enjoyment for all involved. However, achieving this was faced with technical hurdles and battles against the mind-numbing monotony of organizing uploads and the endless cycle of copy and paste, the tedious task of reorganizing information from source to upload, which tested patience and resolve.
So, when the opportunity presented itself two years ago, right in the midst of the pandemic, to pivot my professional career back to software development after nearly two decades on hold, I immediately thought in applying it to something that would facilitate and make more pleasant and exciting those boring-to-tears and error prone Commons massive uploads. The result is Catrapilha - an experimental upload tool built in React, a popular free and open source JavaScript library focused on building user interfaces. The developing tool has been tested with the Arquipelagos image database GLAM, an academic project hosted in a platform built with WordPress and managed by Rui Carita, Emeritus Professor at the University of Madeira, used by both researchers and licentiate and master university students as support for Carita classes on History of Madeira - an insular African ultraperifery of Europe, and autonomous region of Portugal, part of Macaronesia - and the global oceanic and coastal spaces where the Portuguese folk mostly operated in the Age of Discoveries. Professor Rui Carita, well aligned with our mission at the Wikimedia Movement, was kind enough to facilitate under a free license all the original materials produced by him and a number of collaborators - mainly photos, paintings and schemes - present at the Arquipelagos image database, adding to the scores of well sourced and better described public domain files, along with the text accompanying these images. At present, the steadily expanding Arquipelagos image database comprises approximately 100,000 files of educational and cultural significance, each accompanied by its respective description. Many of these files contain original content or materials in the public domain, making them suitable for importation to Commons and reuse in various Wikimedia projects, as several already have been.
The Catrapilha tool, named after the Madeiran word for the heavy machines which are able to push and carry stuff from one side to another, bulldozers (from the Caterpillar American brand), is at this point a straightforward Web app consisting in a main view displaying an image list, with a maximum number of images defined by the user, populated by the contents of the source platform via REST APIs or another collecting technique. The files that were not yet uploaded to Commons- or discarded, if copyrighted or out of scope - are shown, consisting in the current working collection. Clicking in an image thumbnail, the user is transported to the detail view of that file, with the possibility of navigating the collection back and forward. At the detail view, a draft for the file body to be uploaded along with the image is presented in an editable text input, including the filedesc and license-header sections with an initial text that the algorithm was able to build automatically, presented to the user for final corrections and polishing before the upload. The user can either upload the file to Commons, or discard it. After each of those actions, the detail view moves to the next image in the collection, until all the files are processed, after which the user can return to the image list view to collect more files from the source to process. The uploaded/discarded image list is stored (and fetched) in the universal and convenient JSON format, either locally in a .txt readable file, or online, e.g. at the users' Commons profile, whatever is more practical or suitable for the user.
More than 1700 files using a variety of different licenses were already successfully uploaded from Arquipelagos to Commons using the Catrapilha tool, in a very smooth way which leaves to the tool the vast majority of the tedious task of copying and reorganizing all the information gathered through the WordPress REST APIs in a meaningful way recognized by Commons, leaving to the operator/uploader not more than a quick review before hitting the upload button. A set of categories can be defined at the tool, so that most of the curation work would be already done after upload. The tool was successfully tested operating more than one project at the same time, with repositories built in WordPress, Archeevo and Flickr, but I'm convinced it can deal with virtually anything.
Please note that this is not an out-of-the-box tool ready to use, but rather an approach to solving the stated challenges using React. It is mainly targeted to React coders, or people willing to learn and experiment with React over the Wikimedia REST API set, something I personally find extremely amusing and thrilling. The tool is expandable to include as many different projects as you want, each with its proper configurations and settings, what can be achieved by adjusting the provided project specific list and detail components, and adding new ones as needed following the same rules. It can be expanded to include wikidata editing, either per se or as part of a GLAM mass upload campaign, allowing for simultaneous Commons image uploading and correspondent Wikidata editing, if that is the case. At this point it's not fully automated, though fully automating it, if needed, is a very simple step.
The Catrapilha code is hosted and version managed at GitHub, where you can easily fork it and start your version of the tool, or simply reuse it at your own tool. The Arquipelagos project is used as a showcase of how things may work. Please have in mind that this development is done entirely as a volunteer in my spare time, so some code is not as polished and clean as I would like to, and there are still a number of non critical bugs I'm working to fix. Note that in order to use it, you'll need to create a private Config.json with at least your Access token, and your Client application key and Client application secret, allowing Catrapilha access to the wikimedia projects using your credentials. Be careful to never share this information publicly.
As the Catrapilha tool continues to evolve, paving the way for smoother and more efficient uploads, the journey that began with Arquipelagos now extends its reach to include other destinations and diverse repositories across the digital landscape, hopefully also providing a number of useful React code examples and implementations for GLAM-related needs. Thank you very much if you were brave enough to read this till this point, and please leave a message in my Commons talk page if you want further information or help with this tool.
Useful links:
- code at GitHub
- Files uploaded with Catrapilha to Commons (all repositories)
- Files uploaded from Arquipelagos (all types of upload)
- Arquipelagos image database