[Toolserver-l] Query Service Inquiry

Platonides platonides at gmail.com
Fri Apr 29 14:47:50 UTC 2011


Manish Goregaokar wrote:
>>  1. Select 200 random articles.
>>  2. Get the top contributors for each of them.
>>  3. Get the edit counts for those contributors.
>>
> 
> I think he has the list/s of 200 articles, and does not want random ones.
> Plus, he doesn't want the editcounts, he wants their top edited articles,
> with the editcount per article.
> 
> My personal opinion is that this HAS to be done via php (though I can't
> comment of server load).
> Use php-mysql to determine the list of top contributors per given article,
> then loop for each contributor, and give *his* top edited articles...
> Shouldn't be hard, though you might want to clarify what you mean by "top".
> (Top 3? More than X edits? More than X% edits per day/week/month/beginning
> of time? More than X% edits of the top editor?).
> 
> -Manishearth

It's quite easy processing the stub-pages-articles dump, too.

1. Read the dump, if the page title matches, record all editing users.
2. Order the author list per article, select which ones pass to the next
phase.
3. Read the dump again, if the user edited that page (and it's in the
main namespace), record that page name.
4. ???
5. Profit

You may be able to get several steps with a single SQL query, but I'm
not convinced that would perform significantly better.
Working form a XML is a bit outdated, but more reproduceable.



More information about the Toolserver-l mailing list