Manish Goregaokar wrote:
- Select 200 random articles.
- Get the top contributors for each of them.
- Get the edit counts for those contributors.
I think he has the list/s of 200 articles, and does not want random ones. Plus, he doesn't want the editcounts, he wants their top edited articles, with the editcount per article.
My personal opinion is that this HAS to be done via php (though I can't comment of server load). Use php-mysql to determine the list of top contributors per given article, then loop for each contributor, and give *his* top edited articles... Shouldn't be hard, though you might want to clarify what you mean by "top". (Top 3? More than X edits? More than X% edits per day/week/month/beginning of time? More than X% edits of the top editor?).
-Manishearth
It's quite easy processing the stub-pages-articles dump, too.
1. Read the dump, if the page title matches, record all editing users. 2. Order the author list per article, select which ones pass to the next phase. 3. Read the dump again, if the user edited that page (and it's in the main namespace), record that page name. 4. ??? 5. Profit
You may be able to get several steps with a single SQL query, but I'm not convinced that would perform significantly better. Working form a XML is a bit outdated, but more reproduceable.