Hi Adam, Pine, Robert,
Thank for the suggestions! In particular, Adam's link to Ford, etal., where I read:
-- We used Apache's Map Reduce framework on Amazon's Elastic Map Reduce (EMR) cloud computing infrastructure to efficiently extract the history of references to all articles.
That sounds like power tools! I've been using more of a clunky bucket chain procedure which only captures part of what has "stuck" in the river. (They downloaded a corpus with all its deletion history.) We do reach some of the same conclusions, but the data are very different.... (twitter & facebook weren't quite as weighty back in 2012, for example). That said, I suspect it would be much wiser to work on a database dump as they have.
The classified version (linked below) is getting more interesting now. Left papers often do better than their circulation figures would suggest, though Brazil & Germany being the notable exceptions. In any case, what's very clear is that on en-wp, *Pitchfork* does much better than the *Poetry Foundation*.
Not to worry, Robert, *Wikipediocracy* barely makes the list...
I'll have a look at the research mailing list once I've finished exploring Adam's suggestion, Pine. Thanks to the three of you for taking the time to respond!
sashi
wikimedia-l@lists.wikimedia.org