On Fri, Oct 24, 2008 at 5:59 PM, Brion Vibber brion@wikimedia.org wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Johannes Beigel wrote:
Is there a way (or a plan to implement one) to retrieve the list of unique contributors for a given article (from a given revision down to the first one)? Ideally this would accept parameters for the mentioned filtering. I guess inside of MediaWiki code this can be handled very efficiently (using appropriate database queries) and would eliminate the need to transfer lots of redundant data over the socket.
Given that this could require filtering through hundreds of thousands of unique revisions for a single request, I don't think we currently have a good plan for that. :)
I just ran a DISTINCT mysql query for all non-IP editors of [[en:George W. Bush]] on the toolserver, and that took 3 seconds. There are 41790 revisions.
Considering that this would be a worst case article, and that it ran on the overtaxed toolserver, it does seem possible. Maybe if we'd have one MySQL slave / Apache dedicated for this task?
Made-up URL: http://authors.wikimedia.org/en.wikipedia/George_W._Bush
Magnus