Hi,
On Tue, Feb 16, 2010 at 8:31 PM, Domas Mituzas midom.lists@gmail.com wrote:
You can sure assume, that we need to come up with something to "defend a new policy".
Yeah, ban no/broken-UA clients for these things that do cause CPU load, but leave article reading unharmed. Normal readers with Privoxy or other privacy filters (you know, people DO still use them, even if their percentage is small!) can at least READ, then.
Presumably some percentage of that 20-50% will come back as the spammers realize they have to supply the string. Presumably we then start playing whack-a-mole.
Yes, we will ban all IPs participating in this.
Good luck fighting a dynamic bot herder (though I do ask me, with the spam blacklist and the captchas for URLs, what the hell can a botnet master achieve by hitting Wikipedia?!).
Presumably there's a plan for what to do when the spammers begin supplying a new, random string every time.
Random strings are easy to identify, fixed strings are easy to verify.
The point is, what should bot writers do: 1) no UA at all, that's the typical newbie mistake who just supplies GET /w/index.php?action=edit, which works with his localhost wiki and every other wegs. 2) default UA of the programming language (PHPs thingy, cURL, Python, some bots may even use wget and bash scripting, it's not THAT difficult to write a Wikibot in bashscript!) 3) own UA (stuff like "HDBot v1.1 (http://xyz.tld)", which I couldn't use some longer time ago) 4) spoof a browser UA (bad, as the site cant differ between bot and browser)
To avoid the ban, only 3 and 4 are possible, as the default UAs are blocked for most cases. But as 3 not really works, or at least is hard to troubleshoot, it leaves only 4, which you do not want.
Please write some doc that answers this once and for all.
Marco
PS: Oh, and please, please make the 403 msg something that people can figure out what's wrong, it takes AGES if you are a newbie to scripting.