Yes, every editcount, regardless of what one is looking for, is the same request. If a user wants to find the edit summary usage of another user throughout his *entire* editing history, it's still the same number of hits to /w/index.php?title=Special:Contributions&limit=5000 from a scraper than to request a user's raw edit count. Since there aren't that many users who want to know the minor edit usage of an editor with 75,000 edits (only RFA and RFB in en.wp comes to mind), but there are more that want to see that the editor indeed has 75,000 revisions to his credit, there's where the problem lies.
A total editcount can be calculated by a simple database lookup to $wgDBname_user.user_editcount; for those who want to know whether they're close to making their 1000th edit or whatever, making a SELECT of 1000 revisions is overkill. While some smart edit counters have a built-in memory and store revisions in an internal database, the most popular ones do not. The number of requests for a relatively-new user is indeed the same: one. However, again, making a SELECT query for the user's row is much faster, and much less expensive, than a SELECT on the revision table for tons of revisions.
I'd be interested in seeing this enabled, and measuring how many times the 5000-revision requests on Special:Contributions drop. That would be the best estimate as to the impact of scrapers on the site.
-----Original Message----- From: Simetrical [mailto:Simetrical+wikilist@gmail.com] Sent: Monday, February 26, 2007 6:37 PM To: Wikimedia developers Subject: Re: [Wikitech-l] Bot edit rates
On 2/26/07, Titoxd@Wikimedia titoxd.wikimedia@gmail.com wrote:
From my experience, for every ten Special:Contributions requests made,
eight
or nine are just to find out the raw edit count, and the rest are used to find namespace distributions, edit summary and minor edit usage. So, displaying user_editcount on Special:Contributions will cause scrape requests to take a substantial hit...
Doesn't any kind of edit count, with or without detailed statistics, require exactly one request per page of contributions? You can't know how many edits the person has without figuring out how many pages there are, but that requires going through every page, which hopefully you've used to scrape all of the contributions into some kind of internal data structure, from which you can then divine whatever statistics you want, no? Is it that statistics only care about the last 1000 edits or whatever, so there's a limit on how many page requests they'll need if not for wanting the full edit count?