Right now on the English Wikipedia bots are limited to 6 edits per minute. Is it time to raise the permitted edit rate? Do we have the server capacity now to handle rapid editing? --Mets501
Jeremy Cushman wrote:
Right now on the English Wikipedia bots are limited to 6 edits per minute. Is it time to raise the permitted edit rate? Do we have the server capacity now to handle rapid editing? --Mets501
This edit-rate limitation is because nobody wants to flood Recent Changes, not for any particular technical reason. As far as I am aware, given the traffic that the servers already handle, bot traffic is fairly trivial ( current traffic is in excess of 40, 000 SQL queries per second. Please do NOT take this as gospel. Brion, Tim, Domas, Mark and the rest of the sysadmin team know much more about our capacity than I do, so I would recommend checking with them first.
Andrew Garrett (werdna)
Hoi, When a bot has the bot flag, it is not the recent changes that are flooded. The default setting for recent changes is NOT to show changes by bots. It is the changes to the watch list that are affected here the bot changes ARE shown by default. You can hide the changes to bots..
So the argument is fairly weak.
Thanks, GerardM
Andrew Garrett schreef:
Jeremy Cushman wrote:
Right now on the English Wikipedia bots are limited to 6 edits per minute. Is it time to raise the permitted edit rate? Do we have the server capacity now to handle rapid editing? --Mets501
This edit-rate limitation is because nobody wants to flood Recent Changes, not for any particular technical reason. As far as I am aware, given the traffic that the servers already handle, bot traffic is fairly trivial ( current traffic is in excess of 40, 000 SQL queries per second. Please do NOT take this as gospel. Brion, Tim, Domas, Mark and the rest of the sysadmin team know much more about our capacity than I do, so I would recommend checking with them first.
Andrew Garrett (werdna)
On 2/20/07, Andrew Garrett andrew@epstone.net wrote:
Jeremy Cushman wrote:
Right now on the English Wikipedia bots are limited to 6 edits per
minute.
Is it time to raise the permitted edit rate? Do we have the server
capacity
now to handle rapid editing? --Mets501
This edit-rate limitation is because nobody wants to flood Recent Changes, not for any particular technical reason. As far as I am aware, given the traffic that the servers already handle, bot traffic is fairly trivial ( current traffic is in excess of 40, 000 SQL queries per second. Please do NOT take this as gospel. Brion, Tim, Domas, Mark and the rest of the sysadmin team know much more about our capacity than I do, so I would recommend checking with them first.
Andrew Garrett (werdna)
Bot edits don't show up recent changes by default, only in watchlists, where they can be disabled. I'll wait for a response from one of the sysadmins before raising the limit in the bot policy.
--Mets501
Do we have any processes that simply aren't working when limited to 6/min; 360/hr; 8640edits/day?
[[en:user:xaosflux]] ----- Original Message ----- From: "Jeremy Cushman" mets501wiki@gmail.com To: "Wikimedia developers" wikitech-l@lists.wikimedia.org Sent: Tuesday, February 20, 2007 4:29 PM Subject: Re: [Wikitech-l] Bot edit rates
On 2/20/07, Andrew Garrett andrew@epstone.net wrote:
Jeremy Cushman wrote:
Right now on the English Wikipedia bots are limited to 6 edits per
minute.
Is it time to raise the permitted edit rate? Do we have the server
capacity
now to handle rapid editing? --Mets501
This edit-rate limitation is because nobody wants to flood Recent Changes, not for any particular technical reason. As far as I am aware, given the traffic that the servers already handle, bot traffic is fairly trivial ( current traffic is in excess of 40, 000 SQL queries per second. Please do NOT take this as gospel. Brion, Tim, Domas, Mark and the rest of the sysadmin team know much more about our capacity than I do, so I would recommend checking with them first.
Andrew Garrett (werdna)
Bot edits don't show up recent changes by default, only in watchlists, where they can be disabled. I'll wait for a response from one of the sysadmins before raising the limit in the bot policy.
--Mets501 _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 2/20/07, xaosflux xaosflux@gmail.com wrote:
Do we have any processes that simply aren't working when limited to 6/min; 360/hr; 8640edits/day?
Not working, no. But we have many bots who could accomplish leagues more if they could edit faster. For example, my bot, which is migrating userboxes out of the template namespace into user space and fixing the transclusions works almost 24/7 (whenever I have access to a computer), and literally has hundreds of thousands of edits to make. Editing at over 6/min could allow it to accomplish a whole lot more than it does now. --Mets501
Jeremy Cushman wrote:
On 2/20/07, xaosflux xaosflux@gmail.com wrote:
Do we have any processes that simply aren't working when limited to 6/min; 360/hr; 8640edits/day?
Not working, no. But we have many bots who could accomplish leagues more if they could edit faster. For example, my bot, which is migrating userboxes out of the template namespace into user space and fixing the transclusions works almost 24/7 (whenever I have access to a computer), and literally has hundreds of thousands of edits to make. Editing at over 6/min could allow it to accomplish a whole lot more than it does now.
That strikes me as useful, but not urgent. I think the current limit makes sense for such bots (or bot tasks). For urgent ones, perhaps there could be an exception.
Matthew Flaschen
On 2/20/07, Matthew Flaschen matthew.flaschen@gatech.edu wrote:
That strikes me as useful, but not urgent. I think the current limit makes sense for such bots (or bot tasks).
Not if there is, in fact, no reason for it. Personally, I suspect this was from before bot edits were hideable on RC and/or watchlists, and was annoyance-oriented rather than technically-oriented. But I don't know.
On 2/21/07, Simetrical Simetrical+wikilist@gmail.com wrote:
Not if there is, in fact, no reason for it. Personally, I suspect this was from before bot edits were hideable on RC and/or watchlists, and was annoyance-oriented rather than technically-oriented. But I don't know.
Could we increase the limit to 12/minute on a trial basis?
Steve
Jeremy Cushman wrote:
Right now on the English Wikipedia bots are limited to 6 edits per minute. Is it time to raise the permitted edit rate? Do we have the server capacity now to handle rapid editing?
If there is a technical problem with high rate editing then we will limit the rate by technical means. We can't rely on policy to maintain good performance. Recommended bot edit rates are there for non-technical reasons, such as to allow time for review.
Currently there is a technically imposed limit of 8 edits per minute on IPs and new accounts, to mitigate bot-driven vandalism. There is no limit on established accounts.
-- Tim Starling
On 2/21/07, Tim Starling tstarling@wikimedia.org wrote:
If there is a technical problem with high rate editing then we will limit the rate by technical means. We can't rely on policy to maintain good performance. Recommended bot edit rates are there for non-technical reasons, such as to allow time for review.
Currently there is a technically imposed limit of 8 edits per minute on IPs and new accounts, to mitigate bot-driven vandalism. There is no limit on established accounts.
-- Tim Starling
OK, I propose raising the edit rate to 15 edits per minute. I will mention it on the community noticeboard and see if there are any objections there before putting it into the policy. --Mets501
On 2/21/07, Jeremy Cushman mets501wiki@gmail.com wrote:
OK, I propose raising the edit rate to 15 edits per minute. I will mention it on the community noticeboard and see if there are any objections there before putting it into the policy. --Mets501
Whoops, I meant to say the village pump. I've proposed it here: http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28policy%29#Bot_edit_ra... Feel free to comment there if you have any issues. --Mets501
2007/2/21, Tim Starling tstarling@wikimedia.org:
Jeremy Cushman wrote:
Right now on the English Wikipedia bots are limited to 6 edits per
minute.
Is it time to raise the permitted edit rate? Do we have the server
capacity
now to handle rapid editing?
If there is a technical problem with high rate editing then we will limit the rate by technical means. We can't rely on policy to maintain good performance. Recommended bot edit rates are there for non-technical reasons, such as to allow time for review.
Currently there is a technically imposed limit of 8 edits per minute on IPs and new accounts, to mitigate bot-driven vandalism. There is no limit on established accounts.
Does this mean I can disable the get-throttling of the bots? Or does this statement only hold for the edit limit?
Andre Engels wrote:
2007/2/21, Tim Starling tstarling@wikimedia.org:
Jeremy Cushman wrote:
Right now on the English Wikipedia bots are limited to 6 edits per
minute.
Is it time to raise the permitted edit rate? Do we have the server
capacity
now to handle rapid editing?
If there is a technical problem with high rate editing then we will limit the rate by technical means. We can't rely on policy to maintain good performance. Recommended bot edit rates are there for non-technical reasons, such as to allow time for review.
Currently there is a technically imposed limit of 8 edits per minute on IPs and new accounts, to mitigate bot-driven vandalism. There is no limit on established accounts.
Does this mean I can disable the get-throttling of the bots? Or does this statement only hold for the edit limit?
Only for the edit limit. And only then because I doubt anyone will try a high enough edit rate to cause serious load on the server. If someone starts trying 100 reads per second, it's going to cause problems. And more so for 100 writes per second.
As a general rule, all bots should be single threaded. This limits the performance impact they can have on the server -- cheap requests can be done often, and expensive requests less often. If you run requests in parallel, then you're using more than your fair share of server resources.
Even single-threaded bots can cause problems if the requests are particularly long running. If you want to be polite and sleep between requests, you should throttle them in proportional to the service time -- a duty cycle, in other words. Say if a request takes 0.5s to service, you could use a duty cycle of 25%, corresponding to one request every 2 seconds. Think about your load on the server in terms of the number of threads you're tying up, on average.
The worst offenders we've discovered in recent times are the edit counters -- web based scripts that send requests to the servers often lasting minutes, with unrestricted parallelism. We were very tempted to block them all. We would have blocked them for much less if we weren't afraid having an angry mob of hundreds of Wikipedians obsessed with edit counts, descending on our door.
-- Tim Starling
Tim Starling wrote:
The worst offenders we've discovered in recent times are the edit counters -- web based scripts that send requests to the servers often lasting minutes, with unrestricted parallelism. We were very tempted to block them all. We would have blocked them for much less if we weren't afraid having an angry mob of hundreds of Wikipedians obsessed with edit counts, descending on our door.
Amazing: You were very tempted to block them because they were causing too much traffic, but it still didn't occur to you to simply add the count to the Special:Contributions page, thereby peacefully rendering them obsolete?
Timwi
On 2/23/07, Timwi timwi@gmx.net wrote:
Amazing: You were very tempted to block them because they were causing too much traffic, but it still didn't occur to you to simply add the count to the Special:Contributions page, thereby peacefully rendering them obsolete?
First of all, there was no edit count until just recently. Second of all, people aren't just interested in edits, they're interested in edit *stats*. People will still want to know things like how many are article/user talk/Wikipedia:/etc. So if we want this to be faster, we should fix enwiki toolserver replication (if it's not fixed now, I'm not keeping tabs on it) so that the queries can be run directly rather than in tiny chunks interspersed with who knows what unnecessary queries.
Simetrical wrote:
On 2/23/07, Timwi timwi@gmx.net wrote:
Amazing: You were very tempted to block them because they were causing too much traffic, but it still didn't occur to you to simply add the count to the Special:Contributions page, thereby peacefully rendering them obsolete?
First of all, there was no edit count until just recently. Second of all, people aren't just interested in edits, they're interested in edit *stats*.
Are you suggesting that because people want "more than just X", they shouldn't even have X?
Timwi
2007/2/25, Timwi timwi@gmx.net:
Simetrical wrote:
On 2/23/07, Timwi timwi@gmx.net wrote:
Amazing: You were very tempted to block them because they were causing too much traffic, but it still didn't occur to you to simply add the count to the Special:Contributions page, thereby peacefully rendering them obsolete?
First of all, there was no edit count until just recently. Second of all, people aren't just interested in edits, they're interested in edit *stats*.
Are you suggesting that because people want "more than just X", they shouldn't even have X?
I think he's suggesting that giving them X through another means will not stop them from using edit count bots if what they want is more than X.
Andre Engels wrote:
Are you suggesting that because people want "more than just X", they shouldn't even have X?
I think he's suggesting that giving them X through another means will not stop them from using edit count bots if what they want is more than X.
And how is that a reason not to let them have X?
2007/2/25, Timwi timwi@gmx.net:
Andre Engels wrote:
Are you suggesting that because people want "more than just X", they shouldn't even have X?
I think he's suggesting that giving them X through another means will
not
stop them from using edit count bots if what they want is more than X.
And how is that a reason not to let them have X?
It's not, but it might be a reason to spend time on other things rather than on giving the users X. Developer time is not a commodity we have excess of.
From my experience, for every ten Special:Contributions requests made, eight
or nine are just to find out the raw edit count, and the rest are used to find namespace distributions, edit summary and minor edit usage. So, displaying user_editcount on Special:Contributions will cause scrape requests to take a substantial hit...
Titoxd.
-----Original Message----- From: Andre Engels [mailto:andreengels@gmail.com] Sent: Sunday, February 25, 2007 4:54 AM To: Wikimedia developers Subject: Re: [Wikitech-l] Bot edit rates
2007/2/25, Timwi timwi@gmx.net:
Andre Engels wrote:
Are you suggesting that because people want "more than just X", they shouldn't even have X?
I think he's suggesting that giving them X through another means will
not
stop them from using edit count bots if what they want is more than X.
And how is that a reason not to let them have X?
It's not, but it might be a reason to spend time on other things rather than on giving the users X. Developer time is not a commodity we have excess of.
On 2/26/07, Titoxd@Wikimedia titoxd.wikimedia@gmail.com wrote:
From my experience, for every ten Special:Contributions requests made, eight or nine are just to find out the raw edit count, and the rest are used to find namespace distributions, edit summary and minor edit usage. So, displaying user_editcount on Special:Contributions will cause scrape requests to take a substantial hit...
Doesn't any kind of edit count, with or without detailed statistics, require exactly one request per page of contributions? You can't know how many edits the person has without figuring out how many pages there are, but that requires going through every page, which hopefully you've used to scrape all of the contributions into some kind of internal data structure, from which you can then divine whatever statistics you want, no? Is it that statistics only care about the last 1000 edits or whatever, so there's a limit on how many page requests they'll need if not for wanting the full edit count?
Yes, every editcount, regardless of what one is looking for, is the same request. If a user wants to find the edit summary usage of another user throughout his *entire* editing history, it's still the same number of hits to /w/index.php?title=Special:Contributions&limit=5000 from a scraper than to request a user's raw edit count. Since there aren't that many users who want to know the minor edit usage of an editor with 75,000 edits (only RFA and RFB in en.wp comes to mind), but there are more that want to see that the editor indeed has 75,000 revisions to his credit, there's where the problem lies.
A total editcount can be calculated by a simple database lookup to $wgDBname_user.user_editcount; for those who want to know whether they're close to making their 1000th edit or whatever, making a SELECT of 1000 revisions is overkill. While some smart edit counters have a built-in memory and store revisions in an internal database, the most popular ones do not. The number of requests for a relatively-new user is indeed the same: one. However, again, making a SELECT query for the user's row is much faster, and much less expensive, than a SELECT on the revision table for tons of revisions.
I'd be interested in seeing this enabled, and measuring how many times the 5000-revision requests on Special:Contributions drop. That would be the best estimate as to the impact of scrapers on the site.
-----Original Message----- From: Simetrical [mailto:Simetrical+wikilist@gmail.com] Sent: Monday, February 26, 2007 6:37 PM To: Wikimedia developers Subject: Re: [Wikitech-l] Bot edit rates
On 2/26/07, Titoxd@Wikimedia titoxd.wikimedia@gmail.com wrote:
From my experience, for every ten Special:Contributions requests made,
eight
or nine are just to find out the raw edit count, and the rest are used to find namespace distributions, edit summary and minor edit usage. So, displaying user_editcount on Special:Contributions will cause scrape requests to take a substantial hit...
Doesn't any kind of edit count, with or without detailed statistics, require exactly one request per page of contributions? You can't know how many edits the person has without figuring out how many pages there are, but that requires going through every page, which hopefully you've used to scrape all of the contributions into some kind of internal data structure, from which you can then divine whatever statistics you want, no? Is it that statistics only care about the last 1000 edits or whatever, so there's a limit on how many page requests they'll need if not for wanting the full edit count?
On 2/26/07, Titoxd@Wikimedia titoxd.wikimedia@gmail.com wrote:
From my experience, for every ten Special:Contributions requests made, eight or nine are just to find out the raw edit count, and the rest are used to find namespace distributions, edit summary and minor edit usage. So, displaying user_editcount on Special:Contributions will cause scrape requests to take a substantial hit...
Sounds like there is a major desire to see some edit count stats. How about a special page, "Special:Editstats" that would just show the edit count, and in some future incarnation, could show all the other stats that people are obviously clamoring for?
Incidentally, I tend to hit Special:Contributions quite a lot — it's sort of my "home page" on WP. It's where I go first, and where I return to most frequently, to see if anything has happened to any of the pages I edited most recently. If it's such a heavy drain on resources, perhaps I should reconsider my behaviour.
Steve
On 2/27/07, Steve Bennett stevagewp@gmail.com wrote:
Sounds like there is a major desire to see some edit count stats. How about a special page, "Special:Editstats" that would just show the edit count, and in some future incarnation, could show all the other stats that people are obviously clamoring for?
Possibly. The question is how long it would take to run all the queries for a user with tons of edits. How long does something like SELECT COUNT(*) FROM revision WHERE rev_user=whatever AND rev_minor_edit = 1 take, for instance?
Incidentally, I tend to hit Special:Contributions quite a lot — it's sort of my "home page" on WP. It's where I go first, and where I return to most frequently, to see if anything has happened to any of the pages I edited most recently. If it's such a heavy drain on resources, perhaps I should reconsider my behaviour.
It's certainly not a drain on resources unless you set edits per page to 5000 and load it a lot. Even then, it's not anything breathtaking, I assume, or else it would be shut off user complaints or no. It's presumably excessive, but not grossly so.
Ok, no clue why the other email came up empty, so trying again...
From my experience, for every ten Special:Contributions requests made, eight
or nine are just to find out the raw edit count, and the rest are used to find namespace distributions, edit summary and minor edit usage. So, displaying user_editcount on Special:Contributions will cause scrape requests to take a substantial hit...
Titoxd.
-----Original Message----- From: Andre Engels [mailto:andreengels@gmail.com] Sent: Sunday, February 25, 2007 4:54 AM To: Wikimedia developers Subject: Re: [Wikitech-l] Bot edit rates
2007/2/25, Timwi timwi@gmx.net:
Andre Engels wrote:
Are you suggesting that because people want "more than just X", they shouldn't even have X?
I think he's suggesting that giving them X through another means will
not
stop them from using edit count bots if what they want is more than X.
And how is that a reason not to let them have X?
It's not, but it might be a reason to spend time on other things rather than on giving the users X. Developer time is not a commodity we have excess of.
Andre Engels wrote:
It's not, but it might be a reason to spend time on other things rather than on giving the users X. Developer time is not a commodity we have excess of.
Within the time you and others have spent in this thread trying to argue against putting the edit count on Special:Contributions, it could have been put there a million times over.
If nobody wants to do this incredibly trivial thing, do you want me to do it? (I'm asking because I don't want to spend time on installing MediaWiki specially for this just to have someone reject it for some ridiculous reason.)
Timwi
Well, that was the point of the email that didn't get through. In my experience, 80% of users who request their edit counts via scapers want their edit counts, and nothing more. Those who want really detailed analysis of edit statistics will keep using scrapers, but adding user_editcount to Special:Contributions will cause a significant drop in scraper requests.
Titoxd.
-----Original Message----- From: Timwi [mailto:timwi@gmx.net] Sent: Monday, February 26, 2007 9:41 AM To: wikitech-l@lists.wikimedia.org Subject: Re: [Wikitech-l] Bot edit rates
Andre Engels wrote:
It's not, but it might be a reason to spend time on other things rather
than
on giving the users X. Developer time is not a commodity we have excess
of.
Within the time you and others have spent in this thread trying to argue against putting the edit count on Special:Contributions, it could have been put there a million times over.
If nobody wants to do this incredibly trivial thing, do you want me to do it? (I'm asking because I don't want to spend time on installing MediaWiki specially for this just to have someone reject it for some ridiculous reason.)
Timwi
Timwi wrote:
Tim Starling wrote:
The worst offenders we've discovered in recent times are the edit counters -- web based scripts that send requests to the servers often lasting minutes, with unrestricted parallelism. We were very tempted to block them all. We would have blocked them for much less if we weren't afraid having an angry mob of hundreds of Wikipedians obsessed with edit counts, descending on our door.
Amazing: You were very tempted to block them because they were causing too much traffic, but it still didn't occur to you to simply add the count to the Special:Contributions page, thereby peacefully rendering them obsolete?
You're confusing development with system administration. Development takes time. Sometimes a situation develops, and a solution is required at the system administration level, while development is in progress. At the time, the user_editcount field had been recently introduced, but it clearly wasn't sufficient to provide the information users were looking for, as Simetrical notes. Other options for development were apparent, but the expected development time was too long.
It's always going to be a tough decision, when it comes to limiting, suspending or denying services. But it would be irresponsible to just let site-wide performance descend to glacial speeds while development is in progress.
-- Tim Starling
Tim Starling wrote:
At the time, the user_editcount field had been recently introduced, but it clearly wasn't sufficient to provide the information users were looking for, as Simetrical notes.
Hm, you are being a bit ambiguous -- your message /could/ be taken to mean that the edit count is already displayed on Special:Contributions. However, I can't see it. Am I not looking in the right place, or did I misunderstand you?
wikitech-l@lists.wikimedia.org