On mediawiki.org we quite often seem to get new pages created similar to the following:
Page name: /w/w/index.php?title=Extension:Guestbook/w/w/index.php Page text: Dear web-master ! I looked your site and I want to say that yor very well made it .All information on this site is represented for users. A site is made professionally. So to hold !
The similarities are that the pages titles always contain index.php?title=xx (with various depths of /w or /wiki) and that the message is congratulating us on our site. The user is always an anon, but without being able to search the deleted pages I don't know whether the IP address is always the same. There never seems to be any link spam or any of the other common vandalism/spamming traits in the page text.
Have any other wikis experienced this? Some kind of spambot gone wrong, or a mischievous repeat visitor? It's happened too often for me to think it's a genuine congratulatory comment posted by someone with a screwy browser...
Is there anything we can do about it?
- Mark Clements (HappyDog)
On 2/6/07, Mark Clements gmane@kennel17.co.uk wrote:
On mediawiki.org we quite often seem to get new pages created similar to the following:
Page name: /w/w/index.php?title=Extension:Guestbook/w/w/index.php Page text: Dear web-master ! I looked your site and I want to say that yor very well made it .All information on this site is represented for users. A site is made professionally. So to hold !
I believe it's a broken spam bot. If there was a URL field on the page (as there is on a blog comment form), they would leave their spam link there If there's no URL field, you just get the silly comment. Try searching for the text they leave and you will usually find the same comment on blogs and forums, with the spam.
Angela
Mark Clements wrote:
On mediawiki.org we quite often seem to get new pages created similar to the following:
Page name: /w/w/index.php?title=Extension:Guestbook/w/w/index.php Page text: Dear web-master ! I looked your site and I want to say that yor very well made it .All information on this site is represented for users. A site is made professionally. So to hold !
The similarities are that the pages titles always contain index.php?title=xx (with various depths of /w or /wiki) and that the message is congratulating us on our site. The user is always an anon, but without being able to search the deleted pages I don't know whether the IP address is always the same. There never seems to be any link spam or any of the other common vandalism/spamming traits in the page text.
Have any other wikis experienced this? Some kind of spambot gone wrong, or a mischievous repeat visitor? It's happened too often for me to think it's a genuine congratulatory comment posted by someone with a screwy browser...
Is there anything we can do about it?
- Mark Clements (HappyDog)
Very likely, it is a spambot, but it's not gone wrong. Having maintained my own weblog for several years, I've found that spammers often leave very innocuous-looking comments at first, to see how and how quickly the moderator/admin/owner of the site reacts. Then come the links. Unless for whatever reason a page at mediawiki.org links to [[Extension:Guestbook/w/w/index.php]], the very fact that this bot uses an extension like .php at the end of the title means that it's a spambot.
Also note that many more spambots are designed for blogging software like Movable Type and Wordpress, so they might be guessing URLs based on how those software packages work.
With spammers, the IP addresses always change (they've gotten good about that over the past few years), so I suppose the best way to control these "scout messages" is to grep [[Special:Newpages]] regularly for telltale page titles (like ".php") and delete on sight.
On Feb 6, 2007, at 11:45 PM, Minh Nguyen wrote: <snip>
Very likely, it is a spambot, but it's not gone wrong. Having maintained my own weblog for several years, I've found that spammers often leave very innocuous-looking comments at first, to see how and how quickly the moderator/admin/owner of the site reacts. Then come the links. Unless for whatever reason a page at mediawiki.org links to [[Extension:Guestbook/w/w/index.php]], the very fact that this bot uses an extension like .php at the end of the title means that it's a spambot.
Wouldn't the parser strip the links they tried to put in? On the blogs and bbs that I administer, the bots try and don't bother checking back to see if they got through.
Also note that many more spambots are designed for blogging software like Movable Type and Wordpress, so they might be guessing URLs based on how those software packages work.
With spammers, the IP addresses always change (they've gotten good about that over the past few years), so I suppose the best way to control these "scout messages" is to grep [[Special:Newpages]] regularly for telltale page titles (like ".php") and delete on sight.
The sophisticated spammers use botnets, but others don't bother to shift IPs. I think there's been a population explosion in naive spammers in the past month or so. I've blocked >50K trackback spam attempts at my blog just in Feb 2007. 13,236 are from one IP, which tried 29,332 times in January. And these are obscure sites! Maybe a lot of people got "make money at home by spamming" kits for Xmas.
-- Minh Nguyen mxn@zoomtown.com [[en:User:Mxn]] [[vi:User:Mxn]] [[m:User:Mxn]] AIM: trycom2000; Jabber: mxn@myjabber.net; Blog: http://mxn.f2o.org/
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Minh Nguyen wrote:
With spammers, the IP addresses always change (they've gotten good about that over the past few years), so I suppose the best way to control these "scout messages" is to grep [[Special:Newpages]] regularly for telltale page titles (like ".php") and delete on sight.
An admin bot should be able to delete those pages automatically, if someone cares to write it.
Matthew Flaschen
"Matthew Flaschen" matthew.flaschen@gatech.edu wrote in message news:45C963F9.1070803@gatech.edu...
Minh Nguyen wrote:
With spammers, the IP addresses always change (they've gotten good about that over the past few years), so I suppose the best way to control these "scout messages" is to grep [[Special:Newpages]] regularly for telltale page titles (like ".php") and delete on sight.
An admin bot should be able to delete those pages automatically, if someone cares to write it.
Is there a way of telling MW to block certain page names, e.g. to disallow new pages that match the regex "/index.php?title=/i"? If not then this seems like a worthwhile config setting to add.
- Mark Clements (HappyDog)
I suppose it goes without saying that this could be achieved with an Extension. Perhaps some kind of regex blacklist or whitelisting extension?
So for example you could have a page called [[MediaWiki:TitlesBlacklist]] which could be a newline separated list of regular expressions to block for title creation. (With of course an associated [[MediaWiki:TitlesWhitelist]]). Then have an extension which applies the rules prior to page submission, and also on retrieval (in case some malicious user finds a way through).
Or even better, a page called [[MediaWiki:TitlesFilter]] with each line delimiting and "allow" or "deny" rule. Something like this:
------------------------- SNIP ---------------- # Comments and blank lines are ignored
# Disallow any title like this - /.*index.php?title=.*/i
# Allow anything else + /.*/ ------------------------- SNIP ----------------
This is how Nutch decides whether or not to allow a page.[1]
[1] http://lucene.apache.org/nutch/tutorial8.html#Intranet%3A+Configuration
On 2/7/07, Mark Clements gmane@kennel17.co.uk wrote:
"Matthew Flaschen" matthew.flaschen@gatech.edu wrote in message news:45C963F9.1070803@gatech.edu...
Minh Nguyen wrote:
With spammers, the IP addresses always change (they've gotten good
about
that over the past few years), so I suppose the best way to control these "scout messages" is to grep [[Special:Newpages]] regularly for telltale page titles (like ".php") and delete on sight.
An admin bot should be able to delete those pages automatically, if someone cares to write it.
Is there a way of telling MW to block certain page names, e.g. to disallow new pages that match the regex "/index.php?title=/i"? If not then this seems like a worthwhile config setting to add.
- Mark Clements (HappyDog)
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
"Jim Wilson" wilson.jim.r@gmail.com wrote in message news:ac08e8d0702071247x28a130dfva87b91e4859fb060@mail.gmail.com...
I suppose it goes without saying that this could be achieved with an Extension. Perhaps some kind of regex blacklist or whitelisting
extension?
So for example you could have a page called [[MediaWiki:TitlesBlacklist]] which could be a newline separated list of regular expressions to block
for
title creation. (With of course an associated [[MediaWiki:TitlesWhitelist]]). Then have an extension which applies the rules prior to page submission, and also on retrieval (in case some malicious user finds a way through).
[Example snipped]
I would suggest that it only applies when creating new pages - particularly if it is editable on-wiki, otherwise you risk hiding legitimate content, or being unable to delete the vandalism. A bad regex could disable the whole wiki, for example. i.e. this is not a way of implementing regex-based page protection.
On a slightly tangentual note, I don't know if other people have realised this, but with the new cascading protection you can block named pages from being created, by creating e.g. [[Project:Banned pages]], transcluding any pages that users shouldn't be able to create, and enabling cascading protection. All the included pages can now no longer be created (or require login to create, depending on protection level).
- Mark Clements (HappyDog)
I would suggest that it only applies when creating new pages -
particularly if it is editable on-wiki,
I can imagine three places where the check would need to occur: 1) On page creation 2) On page move (in case someone tries to move a page to a restricted title) 3) On page retrieval (in case the rules have changed since the page was first created).
otherwise you risk hiding legitimate content,
That's true - I certainly agree that this is a risk that must be evaluated by the person making the change. Perhaps a "what would this affect" special page to test out a filter prior to application?
or being unable to delete the vandalism.
A special page that shows all pages with invalid titles, with options to delete/rename them?
A bad regex could disable the whole wiki, for example.
That's certainly possible. Such a problem could be avoided by hardcoding in a rule that Sysops can bypass the filter for administration purposes.
i.e. this is not a way of implementing regex-based page protection.
Does everyone feel this way? I like the idea of putting the filters in a MediaWiki page because it allows people with little or no PHP experience, or Sysops without filesystem level access to administrate the site filters.
And the filters don't necessarily have to be regex. Firefox's AdBlock add-on has a good filter mechanism that permits both regular expressions, and simple '*' based wildcarding. This wouldn't be hard to imitate.
Your thoughts?
-- Jim
On 2/7/07, Mark Clements gmane@kennel17.co.uk wrote:
"Jim Wilson" wilson.jim.r@gmail.com wrote in message news:ac08e8d0702071247x28a130dfva87b91e4859fb060@mail.gmail.com...
I suppose it goes without saying that this could be achieved with an Extension. Perhaps some kind of regex blacklist or whitelisting
extension?
So for example you could have a page called
[[MediaWiki:TitlesBlacklist]]
which could be a newline separated list of regular expressions to block
for
title creation. (With of course an associated [[MediaWiki:TitlesWhitelist]]). Then have an extension which applies
the
rules prior to page submission, and also on retrieval (in case some malicious user finds a way through).
[Example snipped]
I would suggest that it only applies when creating new pages - particularly if it is editable on-wiki, otherwise you risk hiding legitimate content, or being unable to delete the vandalism. A bad regex could disable the whole wiki, for example. i.e. this is not a way of implementing regex-based page protection.
On a slightly tangentual note, I don't know if other people have realised this, but with the new cascading protection you can block named pages from being created, by creating e.g. [[Project:Banned pages]], transcluding any pages that users shouldn't be able to create, and enabling cascading protection. All the included pages can now no longer be created (or require login to create, depending on protection level).
- Mark Clements (HappyDog)
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Mark Clements wrote:
So for example you could have a page called [[MediaWiki:TitlesBlacklist]] Which could be a newline separated list of regular expressions to block for title creation. (With of course an associated [[MediaWiki:TitlesWhitelist]]). Then have an extension which applies the rules prior to page submission, and also on retrieval (in case some malicious user finds a way through).
Something like this is on my to-do list - although it will be considerably more sophisticated, and probably disablable core code.
Andrew Garrett (werdna)
Mark Clements escribió:
Is there a way of telling MW to block certain page names, e.g. to disallow new pages that match the regex "/index.php?title=/i"? If not then this seems like a worthwhile config setting to add.
- Mark Clements (HappyDog)
Setting a captcha for anonymous page creation would be easier. This *is* a good use of it, instead of adding it /against/ humans...
"Platonides" Platonides@gmail.com wrote in message news:eqg5f4$f9p$1@sea.gmane.org...
Mark Clements escribió:
Is there a way of telling MW to block certain page names, e.g. to
disallow
new pages that match the regex "/index.php?title=/i"? If not then
this
seems like a worthwhile config setting to add.
- Mark Clements (HappyDog)
Setting a captcha for anonymous page creation would be easier. This *is* a good use of it, instead of adding it /against/ humans...
Except that the pages we need to block in this case are pages that will never need to be created, so it won't get in the way of any legitimite activity which a captcha would.
- Mark Clements (HappyDog)
Mark Clements wrote:
Except that the pages we need to block in this case are pages that will never need to be created, so it won't get in the way of any legitimite activity which a captcha would.
- Mark Clements (HappyDog)
How many anonymous users have a legitimate activity of creating pages on mediawiki.org? How many to create them on an even namespace?
http://www.mediawiki.org/wiki/Special:Newpages seemed to contradict myself, showing three pages created by anons. They were a blanked 'edit experiment' and two "hacked by"
Mark Clements wrote: [...]
Is there a way of telling MW to block certain page names, e.g. to disallow new pages that match the regex "/index.php?title=/i"? If not then this seems like a worthwhile config setting to add.
You can hook a function to the 'userCan' event, and check for action 'create' (or 'move', 'edit', 'read').
See http://www.mediawiki.org/wiki/Help:MediaWiki_hooks and http://www.mediawiki.org/wiki/Manual:MediaWiki_hooks/userCan
Ye $, øn mäny sma!! W!ki$ the s a m e th!ng happens, although not nece$$ari!y at fv nky paeg titlee$ (sometimes it i# on the main t a l k p a g e or somewhere similar).
It's always f unny to see !t get r3moevd as vand ali$m, though -- the first few t!mes I s aw it, I thought osmeone was r3 moevn!g !3g!t com pli m3ntzzs to that Wiki.
Mar k !!!C H 3 A P!!!
On 06/02/07, Mark Clements gmane@kennel17.co.uk wrote:
On mediawiki.org we quite often seem to get new pages created similar to the following:
Page name: /w/w/index.php?title=Extension:Guestbook/w/w/index.php Page text: Dear web-master ! I looked your site and I want to say that yor very well made it .All information on this site is represented for users. A site is made professionally. So to hold !
The similarities are that the pages titles always contain index.php?title=xx (with various depths of /w or /wiki) and that the message is congratulating us on our site. The user is always an anon, but without being able to search the deleted pages I don't know whether the IP address is always the same. There never seems to be any link spam or any of the other common vandalism/spamming traits in the page text.
Have any other wikis experienced this? Some kind of spambot gone wrong, or a mischievous repeat visitor? It's happened too often for me to think it's a genuine congratulatory comment posted by someone with a screwy browser...
Is there anything we can do about it?
- Mark Clements (HappyDog)
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 2/7/07, Mark Clements gmane@kennel17.co.uk wrote:
The similarities are that the pages titles always contain index.php?title=xx (with various depths of /w or /wiki) and that the message is congratulating us on our site. The user is always an anon, but without being able to
I think spambots use thes "congratulatory" messages to maximise their chances of staying on the site. Which of these would you[1] delete:
Check out my site! http://...
You have an awesome site! I spent a while here and it's really fantastic. The quality of information is amazing. Would you be interested in checking out my site? http://
Steve [1] Ok, not "you", but...
Steve Bennett wrote:
On 2/7/07, Mark Clements gmane@kennel17.co.uk wrote:
The similarities are that the pages titles always contain index.php?title=xx (with various depths of /w or /wiki) and that the message is congratulating us on our site. The user is always an anon, but without being able to
I think spambots use thes "congratulatory" messages to maximise their chances of staying on the site. Which of these would you[1] delete:
Check out my site! http://...
You have an awesome site! I spent a while here and it's really fantastic. The quality of information is amazing. Would you be interested in checking out my site? http://
Steve [1] Ok, not "you", but...
Yep, and in Movable Type's master comments list, it gets cropped to "You have an awesome site! I spend a while here and it's..." (And they were just getting to the good part.) You have to open up the comment to read it in full, and they're banking on people being too busy to do that. I assume other blogging software does something similar.
wikitech-l@lists.wikimedia.org