Talk:Education/Dashboard

From Outreach Wiki
Jump to navigation Jump to search

There's another couple of metrics that Wikimetrics doesn't do, but that my team considers pretty important. Those are:

  1. Articles edited by student editors
  2. Page views of articles edited by student editors

You can get this data for courses using the EP extension (as well as longer-term page view data for any single article, and the articles edited by any single user) from my new coursestats tool on Tool Labs:

http://tools.wmflabs.org/coursestats/


For page view queries of larger sets of articles, and for technical details, see the "technical details" section below.

technical details

For people comfortable using the toolserver and running python scripts, it's possible to pull those numbers.--Sage (Wiki Ed) (talk) 17:41, 3 July 2014 (UTC)[reply]

Article edited[edit]

If you have a list of usernames, you can find how many (and which) articles they edited by running a SQL query on WMF Tool Labs. This query lists mainspace, non-redirects edited by the provided set of users in the specified timeframe:

SELECT	page_title
FROM	page
WHERE	page_id IN
	(
	SELECT DISTINCT	rev_page
	FROM		revision_userindex
	WHERE		rev_user_text IN
			(
			"Ragesoss","Ragesock"
			)
	AND 		rev_timestamp BETWEEN "201008" and "201407"
	)
AND	page_namespace = 0
AND NOT	page_is_redirect

You can save the query as a .sql file with the desired cohort of usernames and date range, and then run the query and save the results to a csv file like this:

sql enwiki < myquery.sql > queryresults.csv

Page views[edit]

Right now, the only tool we have for page views is stats.grok.se, which is fairly slow and only returns data for one article at a time. But if you have a list of articles (such as the one generated by the SQL query above), you can use a script like this to pull data for one article at a time and create a big CSV of cumulative page views.

#!/usr/bin/python3
 
import urllib
import urllib.parse
import urllib.request
import json
import csv
import sys

articles = sys.argv[1]
baseurl = 'http://stats.grok.se/json/en/latest90/'
outputfile = sys.argv[2]

# get page views for a single article
def articleviews(article):
    articleurl = baseurl + urllib.parse.quote(article)

    # Try to get the data via url request, and retry if it fails
    attempts = 0
    while attempts < 10:
        try:
            response = urllib.request.urlopen(articleurl)
            attempts = 100
        except urllib.error.HTTPError as e:
            print("HTTP Error:",e.code , articleurl)
            attempts += 1
    # Stop the program if more than 10 attempts fail.
    if attempts == 10:
        print('Too many tries on ' + articleurl )
        raise 

    str_response = response.readall().decode('utf-8')
    data = json.loads(str_response)

    article_name = data['title']

    views= data['daily_views']

    view_sum = sum(views.values())

    f = open(outputfile,'a')
    w = csv.writer(f, delimiter=',')
    w.writerow([article, view_sum])

f = open(articles, 'r')

for line in f:
    line = line.rstrip()
    articleviews(line)

Adjust it to point to the language you want, and then use it (saved as 'pageviews.py') with a file containing the list of articles (eg, 'articles.csv') you want to check page views for, one article per line, and it will create another file in CSV format with the view data:

python3 pageviews.py articles.csv pageviews.csv

It returns a few hundred results per hour, so if you've got a lot of articles to check, it may be running for a while.