Yes, every editcount, regardless of what one is looking for, is the same
request. If a user wants to find the edit summary usage of another user
throughout his *entire* editing history, it's still the same number of hits
to /w/index.php?title=Special:Contributions&limit=5000 from a scraper than
to request a user's raw edit count. Since there aren't that many users who
want to know the minor edit usage of an editor with 75,000 edits (only RFA
and RFB in en.wp comes to mind), but there are more that want to see that
the editor indeed has 75,000 revisions to his credit, there's where the
problem lies.
A total editcount can be calculated by a simple database lookup to
$wgDBname_user.user_editcount; for those who want to know whether they're
close to making their 1000th edit or whatever, making a SELECT of 1000
revisions is overkill. While some smart edit counters have a built-in memory
and store revisions in an internal database, the most popular ones do not.
The number of requests for a relatively-new user is indeed the same: one.
However, again, making a SELECT query for the user's row is much faster, and
much less expensive, than a SELECT on the revision table for tons of
revisions.
I'd be interested in seeing this enabled, and measuring how many times the
5000-revision requests on Special:Contributions drop. That would be the best
estimate as to the impact of scrapers on the site.
-----Original Message-----
From: Simetrical [mailto:Simetrical+wikilist@gmail.com]
Sent: Monday, February 26, 2007 6:37 PM
To: Wikimedia developers
Subject: Re: [Wikitech-l] Bot edit rates
On 2/26/07, Titoxd@Wikimedia <titoxd.wikimedia(a)gmail.com> wrote:
From my experience, for every ten
Special:Contributions requests made,
eight
or nine are just to find out the raw edit count, and
the rest are used to
find namespace distributions, edit summary and minor edit usage. So,
displaying user_editcount on Special:Contributions will cause scrape
requests to take a substantial hit...
Doesn't any kind of edit count, with or without detailed statistics,
require exactly one request per page of contributions? You can't know
how many edits the person has without figuring out how many pages
there are, but that requires going through every page, which hopefully
you've used to scrape all of the contributions into some kind of
internal data structure, from which you can then divine whatever
statistics you want, no? Is it that statistics only care about the
last 1000 edits or whatever, so there's a limit on how many page
requests they'll need if not for wanting the full edit count?