Hi Adam, Pine, Robert,
Thank for the suggestions! In particular, Adam's link to Ford, etal.,
where I read:
-- We used Apache's Map Reduce framework on Amazon's Elastic Map Reduce
(EMR) cloud computing infrastructure to efficiently extract the history
of references to all articles.
That sounds like power tools! I've been using more of a clunky bucket
chain procedure which only captures part of what has "stuck" in the
river. (They downloaded a corpus with all its deletion history.) We do
reach some of the same conclusions, but the data are very different....
(twitter & facebook weren't quite as weighty back in 2012, for
example). That said, I suspect it would be much wiser to work on a
database dump as they have.
The classified version (linked below) is getting more interesting now.
Left papers often do better than their circulation figures would
suggest, though Brazil & Germany being the notable exceptions. In any
case, what's very clear is that on en-wp, *Pitchfork* does much better
than the *Poetry Foundation*.
>> http://www.creoliste.fr/docs/WikiInSources_cat.pdf <<
Not to worry, Robert, *Wikipediocracy* barely makes the list...
I'll have a look at the research mailing list once I've finished
exploring Adam's suggestion, Pine. Thanks to the three of you for
taking the time to respond!