I happen to work on a tool (initially for Liam Wyatt) that might do some of what you want on Wikidata. Given a Wikidata Query (separate topic ;-) or a simple list of Wikidata items, it can record changes made to these items over time. It records the JSON for the Wikidata items, max of one revision/day.
A front-end (to be written) can then extract things like number of sitelinks (Wikipedia articles) for these items over time; Wikidata labels in different languages; number/type of statements added; etc. Ideally, this can be exported as a table, to make pretty stats in R (or the like).
As I said, it's work in progress, but if you have a (initial) list of items, I can start "recording".
On Tue, Oct 6, 2015 at 4:54 PM Andrew Gray andrew.gray@dunelm.org.uk wrote:
On 6 October 2015 at 14:12, Amir E. Aharoni amir.aharoni@mail.huji.ac.il wrote:
Thanks for this email.
This raises a wider question: What is the comfortable way to compare the coverage of a topic in different languages?
For example, I'd love to see a report that says:
Number of articles about UNESCO cultural heritage: English Wikipedia: 1000 French Wikipedia: 1200 Hebrew Wikipedia: 742 etc.
And also to track this over time, so if somebody would work hard on
creating
articles about UNESCO cultural heritage in Hebrew, I'd see a trend graph.
There's two general approaches to this:
a) On Wikidata b) On the individual wikis
Approach (a) would rely on having a defined set of things in Wikidata that we can identify. For example, "is a World Heritage Site" would be easy enough, since we have a property explicitly dealing with WHS identifiers (and we have 100% coverage in Wikidata). "Is of interest to UNESCO" is a trickier one - but if you can construct a suitable Wikidata query...
As Federico notes, for WHS records, we can generate a report like https://tools.wmflabs.org/mix-n-match/?mode=sitestats&catalog=93 (57.4% coverage on hewiki!). No graphs but if you were interested then you could probably set one up without much work.
b) is more useful for fuzzy groups like "of relevance to UNESCO", since this is more or less perfect for a category system. However, it would require examining the category tree for each WP you're interested in to figure out exactly which categories are relevant, and then running a script to count those daily.
A.
- Andrew Gray andrew.gray@dunelm.org.uk
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l