Talk:GLAM/Toolset project/Request for Comments/Technical Architecture

From Outreach Wiki
Jump to navigation Jump to search

Repository & Code Review[edit]

Questions[edit]

  1. Shall we use https://github.com/?
    1. If so, under which username?
    2. Can we create a mediawiki.org email address for that username and have access to its mailbox?
  2. Or is there a way to add a git repository to https://gerrit.wikimedia.org for this project?

Answers[edit]

  1. We may be able to create a project repository in mediawiki.org's gerrit repository as /glam/gwtoolset. have sent out an initial request to the project members and will then forward to sumanah once we have agreement on the repository name and owners.

Need details; separate JSON file transformed into importImages.php commands[edit]

Disclaimer: I have never imported or exported from Commons, so what do I know

You should give examples of the actual data you want to import and export. E.g. commons:File:Boschverloren.png uses Template:Institution.

It seems you can't use Special:Import to import the media file as well as its File:Name.png wiki text, so there's no benefit to reverse-engineering MediaWiki's XML format. I think you can use mw:Manual:importImages.php with the --comment pointing to a big chunk of wiki text, including stuff like Template:Institution; http://biowikifarm.net/meta/Batch_importing_media_files_into_MediaWiki seems a good sample script.

If MediaWiki's XML format is out, then I would use a JSON format to represent the URL of the media file and all its meta data, and then write transforms of this to generate the infobox contents and import commands. These decoupled steps to derive information, transform into JSON, and output MediaWiki-specific markup and import commands wouldn't have to run on MediaWiki or PHP, you could write them in JavaScript to run it in a browser or node.js. (FWIW I did that to export hundreds of images from an online picture gallery, and JSON & node.js worked well.) There's also the tool commons:Commons:Tools/Commonist, written in Java.

A good starting point for the JSON format might be the JSON of API imaginfo queries

None of the other extensions in mw:Category:Import extensions seems relevant to transforming external data into infobox template data. The code probably exists, but not as an extension. Going the other way is the raison d'être of DBpedia, maybe their code is reusable.

Interesting problem, good luck! Skierpage (talk) 20:00, 20 September 2012 (UTC)[reply]

Why Zend Framework?[edit]

The page mentions using Zend Framework 2, but it may be better to import the relevant classes from MediaWiki. Both for MediaWiki developers familiarity and for reusing MediaWiki components (such as the UploadWizard extension). 15:37, 3 November 2012 (UTC)