For those interested: i've merged Nemo's patch, so anyone interested in doing queries for a category can use the script now without needing an additional list of files.
https://github.com/hay/wiki-tools/blob/master/etc/mediacounts-stats.py
-- Hay
On Wed, Mar 25, 2015 at 4:11 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Hay (Husky), 25/03/2015 11:03:
Answering my own question: until somebody puts up a stats.grok.se-like interface for the mediacounts, i've hacked together a Python script that can be used to 'query' the TSV files with a file, or a list of files:
https://github.com/hay/wiki-tools/blob/master/etc/mediacounts-stats.py
And I sent a small silly patch to give a category name like https://commons.wikimedia.org/wiki/Category:Media_from_BEIC as input. Example output attached for the lazy. Some data I found particularly interesting:
- the sum of columns 11–14 (big thumbs),
- the ratio between (1) and column 3 (total transfers),
- column 24 (no Wikimedia referrer). Total transfers in this small sample seem even higher than
pageviews. (1) counts thumbs above 400 pixels, which are usually not embedded by default: (2) should tell how many users probably clicked or did something else. (3) may indicate which files "went viral".
Nemo
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics