The English article on PHP has been subject to some strange spam over the past week. We're reverting the article quickly and blocking spammers as we see them, but it's coming from many different IP addresses, probably trojaned hosts from all over the world. It seems to be an automated process and unfortunately the spambot makes about 10-20 edits over the course of a few minutes. It doesn't even link to working sites, most of the advertised sites have already been taken down.
But since we're still getting hit, could we add some domains to the spam filter please? Here are the commonly spammed domains:
6x.to uni.cc grozny.su tuva.su
See the history at http://en.wikipedia.org/w/index.php?title=PHP&action=history&limit=5...
Other technical solutions would also be appreciated. Thanks,
Rhobite
On Tuesday 08 February 2005 15:32, Rhobite wrote:
Hi,
The English article on PHP has been subject to some strange spam over the past week.
this looks like if they test how their script works before they really start using it on large scale.
but it's coming from many different IP addresses, probably trojaned hosts from all over the world.
if they start using [[botnet]]s to spam Wikipedia then this is worrying.
best regards, Marco
Marco Krohn wrote:
The English article on PHP has been subject to some strange spam over the past week.
this looks like if they test how their script works before they really start using it on large scale.
That's a scary thought.
but it's coming from many different IP addresses, probably trojaned hosts from all over the world.
if they start using [[botnet]]s to spam Wikipedia then this is worrying.
I think it's probably a good idea to assume that someone, at some point, will -- even if the culprits here do not. It really seems to me that it's only a matter of time.
I think it's probably a good idea to assume that someone, at some point, will -- even if the culprits here do not. It really seems to me that it's only a matter of time.
I agree.
I believe more focus should be put on the security side of the software, there should be ways of preventing bot attacks (such as with per-edit human recognition, a la Hotmail or Yahoo user registration), even if this isn't needed right now. Wikipedia is becoming too noticeable not to be a victim of some sort of large scale attack at some point, and we should all know by now that not every vandalism is easily reversible, so we should at least prevent them.
-Pedro
Pedro Fayolle wrote:
I think it's probably a good idea to assume that someone, at some point, will -- even if the culprits here do not. It really seems to me that it's only a matter of time.
I agree.
I believe more focus should be put on the security side of the software, there should be ways of preventing bot attacks (such as with per-edit human recognition, a la Hotmail or Yahoo user registration), even if this isn't needed right now. Wikipedia is becoming too noticeable not to be a victim of some sort of large scale attack at some point, and we should all know by now that not every vandalism is easily reversible, so we should at least prevent them.
-Pedro
By adding per-edit human regognition stuff in, you make bots impossible. This would have a dramatic impact on the projects. Both Wiktionary and Wikipedia have many people operate bots that benefit the quality and quantity of the projects. With the latest security fix, it is not possible to do anonymous edits using bots anymore. A big improvement from a security point of view however, we are still smarting from the resulting downtime for the bots.
Thanks, GerardM
Gerard Meijssen wrote:
By adding per-edit human regognition stuff in, you make bots impossible.
That's only if it's strictly required on *every* edit *ever* by *everyone*, which is not likely to happen as it would be a pain in the butt.
What we probably will be doing at some point is engaging CAPTCHA-like recognition for account creation, and probably for at least some portion of anonymous edits, as a situational response (eg to halt an attack in progress), or more generally or intermittently.
With the latest security fix, it is not possible to do anonymous edits using bots anymore.
This is not actually true; there's a minor bug with a particular interaction of the pywikipediabot, which I've mentioned to Andre. However anonymous bot edits are really not recommended, and will likely get much harder to do as we look more at closing out spambots.
-- brion vibber (brion @ pobox.com)
If you implement some sort of bot detection, you could issue keys to approved bots which would allow them to edit. But I don't think CAPTCHA tests are the right approach, due to accessibility issues.
Rhobite
Rhobite wrote:
If you implement some sort of bot detection, you could issue keys to approved bots which would allow them to edit.
Right.
But I don't think CAPTCHA tests are the right approach, due to accessibility issues.
What would you suggest?
In response to your earlier note; if you haven't found it yet, the blacklist is editable by sysops on Meta: http://meta.wikimedia.org/wiki/Spam_blacklist
Leave a comment on the talk page to request additions.
-- brion vibber (brion @ pobox.com)
On Tue, 08 Feb 2005 13:34:48 -0800, Brion Vibber brion@pobox.com wrote:
But I don't think CAPTCHA tests are the right approach, due to accessibility issues.
What would you suggest?
Unfortunately I don't have any great suggestions. I've dealt with bot spam on a much smaller scale on my weblog, and it's not a simple problem.
In response to your earlier note; if you haven't found it yet, the blacklist is editable by sysops on Meta: http://meta.wikimedia.org/wiki/Spam_blacklist
Leave a comment on the talk page to request additions.
Thanks, I'll do that.
Rhobite
Hi,
On Tue, 8 Feb 2005, Rhobite wrote:
On Tue, 08 Feb 2005 13:34:48 -0800, Brion Vibber brion@pobox.com wrote:
But I don't think CAPTCHA tests are the right approach, due to accessibility issues.
What would you suggest?
Unfortunately I don't have any great suggestions. I've dealt with bot spam on a much smaller scale on my weblog, and it's not a simple problem.
How about a lazy Bayesian similarity checker: spam bots tend to write the same blabla into several articles. So, after each edit, with a certain (low) probability, check for identical (or similar) words in the last 100 articles, and flag those articles with matches (or matched words) for potential spam which can then be blocked more efficiently. Of course, there are words like "is", "are", "the", etc. are probably there, but there are relatively few words which are common (I think something like 1500 words in English). The list of common words could be built on the fly.
Just my 1.5 cents, Dscho
I don't know much about bots - but they must be much faster then humans at making changes? Would it be possible to reject edits that happen less than 5 seconds after the page is served? and/or ask users to take a CAPTCHA test if they were making an edit to a page less than 15 seconds after it was served?
Just a thought.
Paul Youlten ----- Original Message ----- From: "Johannes Schindelin" Johannes.Schindelin@gmx.de To: "Wikimedia developers" wikitech-l@wikimedia.org Sent: Tuesday, February 08, 2005 10:54 PM Subject: Re: [Wikitech-l] Spam on en:PHP
Hi,
On Tue, 8 Feb 2005, Rhobite wrote:
On Tue, 08 Feb 2005 13:34:48 -0800, Brion Vibber brion@pobox.com wrote:
But I don't think CAPTCHA tests are the right approach, due to accessibility issues.
What would you suggest?
Unfortunately I don't have any great suggestions. I've dealt with bot spam on a much smaller scale on my weblog, and it's not a simple problem.
How about a lazy Bayesian similarity checker: spam bots tend to write the same blabla into several articles. So, after each edit, with a certain (low) probability, check for identical (or similar) words in the last 100 articles, and flag those articles with matches (or matched words) for potential spam which can then be blocked more efficiently. Of course, there are words like "is", "are", "the", etc. are probably there, but there are relatively few words which are common (I think something like 1500 words in English). The list of common words could be built on the fly.
Just my 1.5 cents, Dscho
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On Wednesday 09 February 2005 00:11, Paul Youlten wrote:
I don't know much about bots - but they must be much faster then humans at making changes? Would it be possible to reject edits that happen less than 5 seconds after the page is served? and/or ask users to take a CAPTCHA test if they were making an edit to a page less than 15 seconds after it was served?
i sure can fix a typo in less than 5 seconds using the jEdit plugin, provided that the server is fast enough. but i cannot fix many typos one after another in a short time.
daniel
Then you have to answer a CAPTCHA test question - or (better) be given permission to make fast edits when logged in.
Paul Youlten
----- Original Message ----- From: "Daniel Wunsch" the.gray@gmx.net To: "Wikimedia developers" wikitech-l@wikimedia.org Sent: Wednesday, February 09, 2005 12:20 AM Subject: Re: [Wikitech-l] Spam on en:PHP
On Wednesday 09 February 2005 00:11, Paul Youlten wrote:
I don't know much about bots - but they must be much faster then humans at making changes? Would it be possible to reject edits that happen less than 5 seconds after the page is served? and/or ask users to take a CAPTCHA test if they were making an edit to a page less than 15 seconds after it was served?
i sure can fix a typo in less than 5 seconds using the jEdit plugin, provided that the server is fast enough. but i cannot fix many typos one after another in a short time.
daniel _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
As a bot programmer, I would much regret this possibility. Not because of the waiting time itself (the bots are already slowing down their edits to not clutter recentchanges, as well as their other calls to decrease their use of server time), but because the bots currently often do not have the page 'served' in the way normal users are - in many cases the bots get their pages through Special:Export, which enables getting a number of pages with only one HTTP-request.
Another issue is that it might seriously impact the response time. The date of viewing must be either received from the submitter him/herself, or stored at the Wikimedia server. In the first case it offers very little security, because a bot programmer can easily adopt their bot to give whatever date suits best as the viewing date. In the second case some kind of list or database field or whatever needs to be updated with each view as well as checked with each edit.
Andre Engels
On Wed, 9 Feb 2005 00:11:12 +0100, Paul Youlten paul.youlten@gmail.com wrote:
I don't know much about bots - but they must be much faster then humans at making changes? Would it be possible to reject edits that happen less than 5 seconds after the page is served? and/or ask users to take a CAPTCHA test if they were making an edit to a page less than 15 seconds after it was served?
Rhobite wrote:
If you implement some sort of bot detection, you could issue keys to approved bots which would allow them to edit. But I don't think CAPTCHA tests are the right approach, due to accessibility issues.
Although I have been a supporter of limited CAPTCHA use (following the usual wikimedia way of looking for the softest possible security to deal with an actual rather than imagined problem), I am very sympathetic to this point.
What are some possible alternatives, or ways of minimizing this problem?
--Jimbo
LiveJournal offer an audio alternative to fuzzy characters -- https://www.livejournal.com/create.bml (& https for privacy). Although I can't try it from here, I guess it works.
-- Zigger
On Wed, 9 Feb 2005 07:23:10 -0800, Jimmy (Jimbo) Wales wrote:
Rhobite wrote:
If you implement some sort of bot detection, you could issue keys to approved bots which would allow them to edit. But I don't think CAPTCHA tests are the right approach, due to accessibility issues.
... What are some possible alternatives, or ways of minimizing this problem? ...
On Thu, 10 Feb 2005 05:51:52 +1100, Zigger zigger@gmail.com wrote:
LiveJournal offer an audio alternative to fuzzy characters -- https://www.livejournal.com/create.bml (& https for privacy). Although I can't try it from here, I guess it works.
As do Hotmail.
Brion:
This is not actually true; there's a minor bug with a particular interaction of the pywikipediabot, which I've mentioned to Andre. However anonymous bot edits are really not recommended, and will likely get much harder to do as we look more at closing out spambots.
My attempts to fix this bug failed, and reactions to restricting the bot to only logged-in edits have been positive, so for the time I will consider it a feature.
Andre Engels
wikitech-l@lists.wikimedia.org