Spambot

List overview All Threads
Download

newer

older

bandwidth thieves blocked

XML Feed

Tim Starling

11 Mar 2004 11 Mar '04

3:39 p.m.

There's been a spambot active for the last few days. It uses a number (2-15) of open proxies simultaneously to edit at a high rate, often on small unattended wikis. It finds pages by spidering. Each anonymous proxy seems to represent an autonomously spidering bot -- the bots often repeat each others' work, adding the offending external link multiple times. Over the last few days it's probably made over 1000 edits. The site in question is a Chinese Internet marketing company called EMMSS.

The open proxy blocking code I've been developing since the discussion on open proxies at wikien-l still has some teething problems. It should help if it's tweaked somewhat for this particular situation (I originally intended it for human vandals). However there's a few other features on my wishlist which I think would help to deal with this sort of problem.

One is a combined recent changes, showing all wikimedia wikis. This would allow people to watch for anomalous activity on usually quiet wikis. I imagine this would share code with the socket-based IRC bot which is in development.

Another much-requested feature is enhancement of the rollback function, especially to allow for page deletion. The bot created many pages, apparently by following red links. In my opinion, the ideal feature would be to allow the user to supply a list of IP addresses and usernames, and then to revert every edit from those users with a single click. People cleaning up after this spambot often had to revert manually because the bot had edited the same page with multiple IP addresses. For completeness it would be nice to include page-move reversion.

For now we've been dealing with this using filters and editor manpower. Filters are bad, they're against the wiki security model. I'd much rather put more power into the hands of users to deal with these situations, than to continue a filter-based arms race.

If anyone wants to help out with features such as these, please speak up.

In case anyone's wondering, the main response to this problem so far has been to contact a developer, who makes then makes any willing editors temporary sysops on the wikis under attack.

-- Tim Starling

Show replies by date

Jimmy Wales

11 Mar 11 Mar

3:59 p.m.

Tim Starling wrote:

...

Another much-requested feature is enhancement of the rollback function, especially to allow for page deletion. The bot created many pages, apparently by following red links. In my opinion, the ideal feature would be to allow the user to supply a list of IP addresses and usernames, and then to revert every edit from those users with a single click.

This sounds like a wonderful idea, but I see a big problem with it. If we had such a feature some sysops would be tempted to use it as punishment against some bad users. There's a certain amount of 'vigilantism' that I've grumpily tolerated because it's very minor and the people usually "deserve it" in some sense. But this would open a huge new can of worms.

...

For now we've been dealing with this using filters and editor manpower. Filters are bad, they're against the wiki security model. I'd much rather put more power into the hands of users to deal with these situations, than to continue a filter-based arms race.

Yes, I understand what you're saying.

This is a new development, one that has been long anticipated, of course.

Here's one idea -- a switch to toggle a captcha. When a wiki is under a spambot attack, the captcha would be turned on. Some captchas aren't hard to defeat, but I doubt if the spammer is going to bother.

Unfortunately, I fear this will lead us down the path towards captchas on all wikis on all edits, which would be unpleasant.

--Jimbo

Nicolas Weeger

9:59 p.m.

...

This sounds like a wonderful idea, but I see a big problem with it. If we had such a feature some sysops would be tempted to use it as punishment against some bad users. There's a certain amount of 'vigilantism' that I've grumpily tolerated because it's very minor and the people usually "deserve it" in some sense. But this would open a huge new can of worms.

What about something that'd require multiple sysop approval? Like, 5 sysops to check something & click to confirm? Even if it's harder to do, it sounds really more 'safe', as finding 5 sysops willingly 'punishing' a user with this feature seems hard. Also, there could be a strict policy on that: any sysop using this against anything but a spambot gets unsysop right away (of course, i assume a sysop doing this would in any case lose some respect from his community).

Nicolas 'Ryo'

Jimmy Wales

12 Mar 12 Mar

7:43 p.m.

Well, I think 5 is an unnecessarily high threshold, but yes I absolutely can see the merit in having 2 sysops approve something like this before it takes effect. I'll have to think about that.

I'm just generally in favor of being cautious about hierarchy and power, and doing just whatever is minimally necessary to accomplish our goals.

Nicolas Weeger wrote:

...

...
This sounds like a wonderful idea, but I see a big problem with it. If we had such a feature some sysops would be tempted to use it as punishment against some bad users. There's a certain amount of 'vigilantism' that I've grumpily tolerated because it's very minor and the people usually "deserve it" in some sense. But this would open a huge new can of worms.

What about something that'd require multiple sysop approval? Like, 5 sysops to check something & click to confirm? Even if it's harder to do, it sounds really more 'safe', as finding 5 sysops willingly 'punishing' a user with this feature seems hard. Also, there could be a strict policy on that: any sysop using this against anything but a spambot gets unsysop right away (of course, i assume a sysop doing this would in any case lose some respect from his community).

Nicolas 'Ryo' _______________________________________________ Wikitech-l mailing list Wikitech-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

Nicolas Weeger

8:31 p.m.

Jimmy Wales a écrit :

...

Well, I think 5 is an unnecessarily high threshold, but yes I absolutely can see the merit in having 2 sysops approve something like this before it takes effect. I'll have to think about that.

I'm just generally in favor of being cautious about hierarchy and power, and doing just whatever is minimally necessary to accomplish our goals.

I was saying 5 as an example. Maybe 3 or 4, depending on the number of sysops on a particular Wiki. Or some hybrid system, like 2 sysops and 2 non sysop users - this will prevent sysop abuse (some people on fr have complained of sysops being an oligarchy :))

I don't really see this as a hierarchy, actually. It's more a security against sysop abuse. And as I said in some discussion, sysops aren't 'higher', they just are caretakers, sweepers :)

Nicolas

Tim Starling

4 a.m.

Jimmy Wales wrote:

...

Tim Starling wrote:

...
Another much-requested feature is enhancement of the rollback function, especially to allow for page deletion. The bot created many pages, apparently by following red links. In my opinion, the ideal feature would be to allow the user to supply a list of IP addresses and usernames, and then to revert every edit from those users with a single click.

This sounds like a wonderful idea, but I see a big problem with it. If we had such a feature some sysops would be tempted to use it as punishment against some bad users. There's a certain amount of 'vigilantism' that I've grumpily tolerated because it's very minor and the people usually "deserve it" in some sense. But this would open a huge new can of worms.

You can't expect people to follow the rules when every violation is tolerated on the basis that it is in some sense justified. Sysops have invested a great deal of time and emotional capital in Wikipedia. Banning is not an effective punishment for trolls or vandals who will switch identites and start again without a second thought. Banning or demoting a sysop, on the other hand, is an effective punishment both emotionally and in terms of the extreme difficulty in resuming the original behaviour.

What I'm saying is that rules governing sysop behaviour can, in principle, be enforced. However, I'll admit that it's important that sysop actions be reversible.

Nicolas Weeger wrote:

...

What about something that'd require multiple sysop approval? Like, 5 sysops to check something & click to confirm?

Only one of the seven wikis attacked had more than 5 sysops before the attack. There were generally 1-3 foreign sysops working on cleaning up the mess.

Jimmy Wales wrote:

...

Here's one idea -- a switch to toggle a captcha. When a wiki is under a spambot attack, the captcha would be turned on. Some captchas aren't hard to defeat, but I doubt if the spammer is going to bother.

Unfortunately, I fear this will lead us down the path towards captchas on all wikis on all edits, which would be unpleasant.

Captchas could slow down the attack somewhat. Let's say he uses 10 IP addresses on 7 wikis. That means he would have to perform about 70 captchas. If each captcha takes 10 seconds, then it would require about 11 minutes of human time to perform the attack. Of course, he spent far longer than 11 minutes developing the bot, but it does put a practical limit on the scale and speed of the attack.

-- Tim Starling

Evan Prodromou

5:32 a.m.

...

...
...
...
...
"TS" == Tim Starling ts4294967296@hotmail.com writes:

>> Here's one idea -- a switch to toggle a captcha.

That's an extremely good idea. Only one problem: I believe that there are patent issues around captchas. Not saying that Wikipedia couldn't get a patent license, but it'd have to stay out of the distributed MediaWiki code.

I'm not sure about this and I'm not turning up any terribly informative Google hits except this one:

http://www2.parc.com/istl/projects/captcha/history.htm

~ESP

-- Evan Prodromou evan@wikitravel.org Wikitravel - http://www.wikitravel.org/ The free, complete, up-to-date and reliable world-wide travel guide

Timwi

6:17 a.m.

Evan Prodromou wrote:

...

...
...
...
...
...
"TS" == Tim Starling ts4294967296@hotmail.com writes:
>> Here's one idea -- a switch to toggle a captcha.
That's an extremely good idea. Only one problem: I believe that there are patent issues around captchas. Not saying that Wikipedia couldn't get a patent license, but it'd have to stay out of the distributed MediaWiki code.

LiveJournal uses CAPTCHA and I'm not aware that they are paying any licenses for that. Knowing Brad, I'm pretty sure he wouldn't have done it if that had been necessary.

Ashar Voultoiz

9:51 a.m.

On Thu, 11 Mar 2004 22:32:45 -0500, Evan Prodromou wrote:

...

...
...
...
...
...
"TS" == Tim Starling ts4294967296@hotmail.com writes:
>> Here's one idea -- a switch to toggle a captcha.
That's an extremely good idea. Only one problem: I believe that there are patent issues around captchas. Not saying that Wikipedia couldn't get a patent license, but it'd have to stay out of the distributed MediaWiki code.

I'm not sure about this and I'm not turning up any terribly informative Google hits except this one:

http://www2.parc.com/istl/projects/captcha/history.htm

~ESP

Hello,

Although it's a good idea to prevent vandalism, it will also block harmless block that solve disambiguation, correct typos, convert html tables to wiki tables, updates interwikis ...

Since the migration end of 2002, our bot made more than 30 000 edits out of the 300 000 (considering oldid in #frrc).

cheers,

-- Ashar Voultoiz http://fr.wikipedia.org/Utilisateur:Hashar do not send mail to listme@listme.dsbl.org

Evan Prodromou

7:47 p.m.

...

...
...
...
...
"AV" == Ashar Voultoiz thoane@altern.org writes:

AV> Although it's a good idea to prevent vandalism, it will also AV> block harmless block that solve disambiguation, correct typos, AV> convert html tables to wiki tables, updates interwikis ...

It'd make sense to allow logged-in approved bots then.

~ESP

-- Evan Prodromou evan@wikitravel.org Wikitravel - http://www.wikitravel.org/ The free, complete, up-to-date and reliable world-wide travel guide

Jimmy Wales

7:49 p.m.

Good grief. The patent system is out of control.

Actually, since the original captcha idea is in fact clever, and since I'm not a totally anti-patent person, I could it being reasonable to reward the inventors of the idea with a 2 year patent. 17 years is an infinity.

I wonder if there are things that we've developed in house here at Wikimedia that we should patent, just to license our patents freely of course, and also to make fun of the patent system.

--Jimbo

Evan Prodromou wrote:

...

...
...
...
...
...
"TS" == Tim Starling ts4294967296@hotmail.com writes:
>> Here's one idea -- a switch to toggle a captcha.
That's an extremely good idea. Only one problem: I believe that there are patent issues around captchas. Not saying that Wikipedia couldn't get a patent license, but it'd have to stay out of the distributed MediaWiki code.

I'm not sure about this and I'm not turning up any terribly informative Google hits except this one:

http://www2.parc.com/istl/projects/captcha/history.htm

~ESP

-- Evan Prodromou evan@wikitravel.org Wikitravel - http://www.wikitravel.org/ The free, complete, up-to-date and reliable world-wide travel guide _______________________________________________ Wikitech-l mailing list Wikitech-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

Ray Saintonge

10:36 p.m.

Jimmy Wales wrote:

...

Good grief. The patent system is out of control.

I get the impression that the pharmaceutical companies are the worst offenders.

...

Actually, since the original captcha idea is in fact clever, and since I'm not a totally anti-patent person, I could it being reasonable to reward the inventors of the idea with a 2 year patent. 17 years is an infinity.

I believe that it was changed from 17 years from the date of granting to 20 years from the application date. That prevents extensions from arising through playing games with the application process. But 20 years is a lot more reasonable than the 95 years for copyright. :-)

...

I wonder if there are things that we've developed in house here at Wikimedia that we should patent, just to license our patents freely of course, and also to make fun of the patent system.

Good idea! Especially now that such things as business processes have become patentable.

Hr. Daniel Mikkelsen

11:49 p.m.

On Fri, 12 Mar 2004, Jimmy Wales wrote:

...

I wonder if there are things that we've developed in house here at Wikimedia that we should patent, just to license our patents freely of course, and also to make fun of the patent system.

This is not necessary, since we publish our source code making our methods "prior art", and (in theory) impossible for other people to claim patents on.

The only reason I could see to get patents would be to use them as muscle for cross-licensing if we get threatend by some big bad company.

I don't think it's Wikipedia's role to start doing this. I'm sure the FSF and other organisations have considered this - and if or when they decide to go ahead with it, we can join in as part of a concerted effort.

But just going ahead and registering patents would IMO be going against the Wikipedia tradition of not focusing on problems that aren't real yet.

-- Daniel

Kelly Anderson

18 Mar 18 Mar

2:25 a.m.

I'm all for making fun of the patent system. Especially the software patent system. Having written a patent or two myself in the past, I can tell you that it would not be horrendously difficult, but it would likely involve some expense due to the filing fees and so forth. It would be fun to write a patent as a wiki page... :-)

One problem with patents in the open source community is that application for the patent generally should proceed publication of the idea, and filing of the patent must occur no more than one year after the idea has been made public.

-Kelly

At 10:49 AM 3/12/2004, you wrote:

...

Good grief. The patent system is out of control.

Actually, since the original captcha idea is in fact clever, and since I'm not a totally anti-patent person, I could it being reasonable to reward the inventors of the idea with a 2 year patent. 17 years is an infinity.

I wonder if there are things that we've developed in house here at Wikimedia that we should patent, just to license our patents freely of course, and also to make fun of the patent system.

--Jimbo

Evan Prodromou wrote:

...
...
...
...
...
> "TS" == Tim Starling ts4294967296@hotmail.com writes:
>> Here's one idea -- a switch to toggle a captcha.
That's an extremely good idea. Only one problem: I believe that there are patent issues around captchas. Not saying that Wikipedia couldn't get a patent license, but it'd have to stay out of the distributed MediaWiki code.

I'm not sure about this and I'm not turning up any terribly informative Google hits except this one:

http://www2.parc.com/istl/projects/captcha/history.htm

~ESP

-- Evan Prodromou evan@wikitravel.org Wikitravel - http://www.wikitravel.org/ The free, complete, up-to-date and reliable world-wide travel guide _______________________________________________ Wikitech-l mailing list Wikitech-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

Jimmy Wales

12 Mar 12 Mar

7:46 p.m.

Tim Starling wrote:

...

Captchas could slow down the attack somewhat. Let's say he uses 10 IP addresses on 7 wikis. That means he would have to perform about 70 captchas. If each captcha takes 10 seconds, then it would require about 11 minutes of human time to perform the attack. Of course, he spent far longer than 11 minutes developing the bot, but it does put a practical limit on the scale and speed of the attack.

It also means that he has to display what's going on in a browser, rather than doing the whole thing from a perl script for example. To me, anyhow, that would be a daunting task, but then again, automating a browser isn't something I know much about.

My thinking is just that when he sees the captcha and realizes that his entire script has to be rewritten from scratch, he might just realize that it's totally not worth it.

One good thing about spammers is that they are lazy and only in it for the money, as opposed to an ideological opponent or random lunatic who might spend a completely absurd amount of time trying to get around something.

--Jimbo

Ray Saintonge

10:29 p.m.

Jimmy Wales wrote:

...

One good thing about spammers is that they are lazy and only in it for the money, as opposed to an ideological opponent or random lunatic who might spend a completely absurd amount of time trying to get around something.

Yes, it's important to take human nature into account. The Canadian that was just charged under the US anti-spam law (US jurisdiction may still be a separate issue) manged to send out 94,000,000 spams in January alone. He hasn't go time for ideogical debates where winning might only give him a few hundred more potential victims.

Evan Prodromou

11 Mar 11 Mar

7:36 p.m.

...

...
...
...
...
"TS" == Tim Starling ts4294967296@hotmail.com writes:

TS> For now we've been dealing with this using filters and editor TS> manpower. Filters are bad, they're against the wiki security TS> model. I'd much rather put more power into the hands of users TS> to deal with these situations, than to continue a filter-based TS> arms race.

TS> If anyone wants to help out with features such as these, TS> please speak up.

I've started work on doing edit throttling:

http://meta.wikipedia.org/wiki/Edit_throttling

It's not too effective against distributed attacks, though, since it only throttles a given user or IP address.

~ESP

-- Evan Prodromou evan@wikitravel.org Wikitravel - http://www.wikitravel.org/ The free, complete, up-to-date and reliable world-wide travel guide

Jimmy Wales

8:56 p.m.

Evan Prodromou wrote:

...

 http://meta.wikipedia.org/wiki/Edit_throttling
It's not too effective against distributed attacks, though, since it only throttles a given user or IP address.

*nod*

Did you see my idea of a table like this?

user page throttle expiration jwales DNA 2 (timestamps go here) * Israel 3 jwales Turkey 0 plautus * 0 wik * 3

The idea is that for user/page combos, which could possibly include wildcards (or even regexp's, although simplicity is a virtue and the power of regexps might be overkill), there could be a throttle which would be the number of edits per whatever unit of time or similar.

What interests me about this way of looking at it is that it is a generalization of what we already do, i.e. site bans and page protection are both just special cases.

plautus * 0 says he can't edit anything. * Ham 0 says that [[Ham]] is protected.

---

For a distributed spambot attack, I don't know if this is helpful. I just wanted lots of people to see it and tell me what's wrong with it.

Perhaps a spambot attack could be taken care of with a "special case" ad hoc technique, for example "if the body of the article contains this string (spammers url) on a save of an edit, then we enter this ip number into the throttle table like this:

129.79.1.1 * 1

with an expiration of say 24 hours.

--Jimbo

Magnus Manske

9:41 p.m.

I think this is a good approach; for defeating a severe distributed attack, it should also be possible to prevent both editing by *all* IPs and the creation of new user accounts (to prevent a bot creating a sock puppet, then spamming under that user name) for a limited time.

Additionally, during such an emergency, all prevented edits could be cached and then manually approved. That would prevent having to clean up a real flood, should the need ever arise.

Magnus

Jimmy Wales wrote:

...

Evan Prodromou wrote:

...
http://meta.wikipedia.org/wiki/Edit_throttling
It's not too effective against distributed attacks, though, since it only throttles a given user or IP address.
*nod*

Did you see my idea of a table like this?

user page throttle expiration jwales DNA 2 (timestamps go here)
  Israel    3
jwales Turkey 0 plautus * 0 wik * 3

The idea is that for user/page combos, which could possibly include wildcards (or even regexp's, although simplicity is a virtue and the power of regexps might be overkill), there could be a throttle which would be the number of edits per whatever unit of time or similar.

What interests me about this way of looking at it is that it is a generalization of what we already do, i.e. site bans and page protection are both just special cases.

plautus * 0 says he can't edit anything.

Ham 0 says that [[Ham]] is protected.

For a distributed spambot attack, I don't know if this is helpful. I just wanted lots of people to see it and tell me what's wrong with it.

Perhaps a spambot attack could be taken care of with a "special case" ad hoc technique, for example "if the body of the article contains this string (spammers url) on a save of an edit, then we enter this ip number into the throttle table like this:

129.79.1.1 * 1

with an expiration of say 24 hours.

--Jimbo

Wikitech-l mailing list Wikitech-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

Arvind Narayanan

13 Mar 13 Mar

4:15 p.m.

I have an idea: since the spambot's goal is to add links to a website, why not give admins the ability to (temporarily) filter all edits which involve a certain domain (more generally, a regex). This would apply site-wide. That is, you could prevent any edit which adds "*.spammersdomain.com/*" to a page. The code would have to be slightly clever to detect url encodings, etc., but surely doable.

Arvind

-- Its all GNU to me

7588

Age (days ago)

7595

Last active (days ago)

wikitech-l@lists.wikimedia.org

19 comments

12 participants

tags (0)

participants (12)

Arvind Narayanan
Ashar Voultoiz
Evan Prodromou
Hr. Daniel Mikkelsen
Jimmy Wales
Kelly Anderson
Magnus Manske
Nicolas Weeger
Nicolas Weeger
Ray Saintonge
Tim Starling
Timwi