Wikimedia Hackathon report: Upgrading GLAM tech tools and PAWS
As part of the recent Wikimedia Hackathon, a number of Wikimedians including User:Chicocvenancio, User:Fuzheado, Tony Hirst, User:Susannaanas, and User:Yuvipanda worked to better document and upgrade the PAWS computing environment on Wikimedia servers for the GLAM Wiki and greater Wikimedia community.
PAWS (https://paws.wmcloud.org) is an often-overlooked coding and development system geared towards those starting out with programming or automated wiki tasks. But it's not just for beginners: it is a full-fledged computing environment being used by dozens of folks for large scale GLAM tasks. Anyone with a Wikimedia login can log into PAWS to try running some basic bot code or just to familiarize themselves with coding tools.
PAWS itself is simply a Wikimedia-specific instance of Jupyter Notebooks, a popular "literate" or "interactive" programming environment useful for experimenting with code. The benefit of running on Wikimedia infrastructure is fast access to servers and data while being automatically authenticated for database actions after logging in.
PAWS new notebook options: OpenRefine and SPARQL
PAWS is also a general one-click computing container, so it can run a variety of different open source packages. Some of the important innovations that resulted from the Hackathon for GLAM Wiki users:
- OpenRefine - OpenRefine is used for reconciliation and data cleaning, primarily to see how a data set matches (or not) Wikidata's items. Of main interest to GLAM folks: you can now run OpenRefine on the Wikimedia cloud servers, via PAWs, instead of downloading it to a local computer and installing it. This is useful for folks who cannot install software on locked-down institutional computers, or those who want to do training for OpenRefine and have been hampered by installation woes for individual users. Having folks login to PAWS and run OpenRefine in the same turnkey environment and version is a huge plus. Another benefit of having this in the cloud: after running a reconciliation session in OpenRefine on PAWS, you can share your working data with others with a public link. If you are logged in to PAWS you can choose OpenRefine from the "New" menu, or you can access it with this link:
- SPARQL kernel - If you've ever run a series of Wikidata SPARQL queries but wanted to save them or show a progression of several queries, PAWS can now save SPARQL queries in a Jupyter notebook format. You can make a new SPARQL notebook from the New menu. A sample can be seen here:
JupyterLab interface for PAWS
- JupyterLab - PAWS has a more advanced JupyterLab mode that is more sophisticated in handling multiple files and extensions. By default, PAWS loads up the classic "notebook" view but thanks to the work at the Hackathon, PAWS can now run JupyterLab, and is useful if you're working with multiple files. PAWS starts up in classic notebook mode, but you can visit this URL to run it in JupyterLab mode:
- PAWS notebooks as apps - A PAWS notebook can interactively show how code can make Wikipedia/Wikidata edits. But you can also develop a usable app with a point-and-click user interface in PAWS. Thanks to the work of User:Chicocvenancio and User:Yuvipanda, the extension Voilá is now available in PAWS, which allows you to execute a fully functioning standalone app on Wikimedia servers, run under your account.
PAWS with the Voilá option to run as a standalone app.
- For an example of a project published with this system, see the USA report writeup for this month, where User:Fuzheado describes his Wikidata Graph Browser project. You can also click on this link to launch the tool using Voilá to see how it works.
- R and RStudio - PAWS has previously been limited to the notebooks using the Python programming language, but during the hackathon support for R was added. R is a programming language focused on statistical computing and graphics which a be powerful tools when one needs to do data analysis and visualization. In addition to using R in classic notebooks PAWS now also comes with RStudio, an "integrated development environment" which allows you to work with multiply files, tools to debug your code, built-in tutorials, and much more. To get started with R in RStudio, use the link below:
Why I wanted to learn to use Jupyter
Those working with GLAM know the gap between bots and the tools that have been developed for non-coding data wranglers to upload images to Wikimedia Commons or data to Wikidata. Sometimes these tools get outdated. For the bots, on the other hand, there are not that many that can create them, and this skill requirement becomes a barrier for non-coders to accomplish reasonable results. Jupyter notebooks are coding done in the wiki way. With them, you can achieve almost anything that the bots can do, but you can build upon notebooks that others have done, copy and steal code from others and see the results in real time. They are called notebooks because you can scribble text between the chunks of code, and use the notebooks for explaining what the code does.
My idea going to the hackathon was to be able to reconcile tens of thousands of geographic items to data already in Wikidata. I wanted to find items that are located in the same place and only after that start to compare their names in different languages and other properties. There are many tools that could be tweaked to do that, but the data in Wikidata for my items was so heterogenous that I should be able switch between different approaches. So I started to create a notebook for reconciling based on geographic data. It's just a start. The notebook is not working yet, but I have a sense that it will work! I am eager to learn myself and together with others, and create a library of recipes we can exchange with one another to do the things we need. The story continues... – Susanna Ånäs
No comments yet. Yours could be the first!