If you've ever set up a fresh MediaWiki and tried to leave it open to editing, you'll know about the problem of wiki spam. There's various well documented tricks to tackle the problem on your own wiki (although it seems to me that some of these are becoming less effective over time. Particularly reCaptcha)
But wiki spammers are behaving in a staggeringly inconsiderate and anti-social way, and I've often thought we should explore stronger ways of delivering some fight back.
Looking at spam across many wikis e.g. by googling "mediawiki ugg boots" we could do more internet-wide spam cleanup somehow.
But recently I came across something, actually by looking at one the spam links. Take a look at this: sickseo.co.uk/off-page-seo.html This video shows the use of a tool called 'SENuke Xcr' and another one called 'Ultimate Demon' to perform "Off site SEO" ...that's "spamming" to you and me.
I've always known spammers used tools like this, but seeing this instructional video gives me new insights into what we're up against. Also see the discussion taking place on forum.edwinsoft.com I find it amazing how oblivious these people seem to be, to how annoying their activities are for people running websites. Never do they mention the word spam, or have an inkling that they may be doing something ethically questionable.
Do you think we should try to contact them and explain that they are behaving badly? Maybe on these forums. Maybe we'd have to spam them back repeatedly with such messages as they get removed by the admins. And on youtube do you think we can get videos like this removed? We can at least comment on them and vote them down (Youtube finds quite a lot of similar videos) How about unleashing a bit of "ethical hacking" e.g. DDOS attacks on people distributing this software? Really I'm amazed at how they're getting away with this spamming out in the open these days.
The spammers know they are being disruptive, they just dont care unless it is their site.
On Wed, May 22, 2013 at 1:07 PM, halz halz_antispam@yahoo.co.uk wrote:
If you've ever set up a fresh MediaWiki and tried to leave it open to editing, you'll know about the problem of wiki spam. There's various well documented tricks to tackle the problem on your own wiki (although it seems to me that some of these are becoming less effective over time. Particularly reCaptcha)
But wiki spammers are behaving in a staggeringly inconsiderate and anti-social way, and I've often thought we should explore stronger ways of delivering some fight back.
Looking at spam across many wikis e.g. by googling "mediawiki ugg boots" we could do more internet-wide spam cleanup somehow.
But recently I came across something, actually by looking at one the spam links. Take a look at this: sickseo.co.uk/off-page-seo.html This video shows the use of a tool called 'SENuke Xcr' and another one called 'Ultimate Demon' to perform "Off site SEO" ...that's "spamming" to you and me.
I've always known spammers used tools like this, but seeing this instructional video gives me new insights into what we're up against. Also see the discussion taking place on forum.edwinsoft.com I find it amazing how oblivious these people seem to be, to how annoying their activities are for people running websites. Never do they mention the word spam, or have an inkling that they may be doing something ethically questionable.
Do you think we should try to contact them and explain that they are behaving badly? Maybe on these forums. Maybe we'd have to spam them back repeatedly with such messages as they get removed by the admins. And on youtube do you think we can get videos like this removed? We can at least comment on them and vote them down (Youtube finds quite a lot of similar videos) How about unleashing a bit of "ethical hacking" e.g. DDOS attacks on people distributing this software? Really I'm amazed at how they're getting away with this spamming out in the open these days.
Halz http://www.mediawiki.org/wiki/User:Halz
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
The spammers know they are being disruptive, they just don't care unless it is their site.
You're probably right a lot of the time, but I'm not sure it's true in every case. This is what surprised me most when I came across all this information linked off sickseo .co.uk about how to use "off site SEO" software. It looks like there's quite a separation between the bad guys who develop this software (and who also do things like finding lists of vulnerable websites), and then the breed of dumb "SEO experts" who buy the software and use it to run their "campaigns". I think the latter could quite easily be unaware that they're doing something wrong.
We can at least comment on them and vote them down (Youtube finds quite a lot of similar videos)
Please dont do that. All thats gonna accomplish is that the videos will rise in the search rankings (yes, even the downrating does that).
Surprised by this. You think if we all go here http://www.google.com/search?client=safari&rls=en&q=off+page+seo+you... and downvote these videos, google will rank the video higher? Possible I suppose. Anyway that's not necessarily a big problem. If the video gets attention from more right-thinking people who will work to combat this, then that's a good thing. Also the aim would be to provide *some* indication to dumb SEO people, that there are people out there suffering on the receiving end of these tools, because currently there is no such indication.
How about unleashing a bit of "ethical hacking" e.g. DDOS attacks on people distributing this software?
You can go to jail for that, in a lot of places. Even if they DDoS'd you first, and you DDoS'd them back, you'd be more likely to go to jail than them, because being an amateur, you'd be more likely to slip up in some traceable way.
Well yeah. Partly I was mentioning this just to throw them to the wolves. I appreciate you don't want to be seen to be recommending breaking the law, but really if you're reading this and you have a way of disrupting the operations of edwinsoft .com (for example) legally or illegally you'd be doing the world a favor in my opinion. I wouldn't attempt anything outright illegal myself, but there's a some interesting legally grey approaches. Remember "spam vampire" and the lycos antispam screensaver? Sort of an opt-in DDOS attack.
O'rielly article from way back in 2004: http://www.oreillynet.com/network/2004/12/03/chongq.html
I like the quote towards the end there: "Using purely defensive means has not worked. It is like someone throwing punches at you and all you do is hold your arms over your face to fend off the blows,"
"has not worked" is too strong of course. Defensive means can and do work ok *if* you know how to set them up, and if you keep checking back to make sure it is working. On an internet-wide scale, looking across all the mediawiki installs and attempted mediawiki installs out there, and looking at the experience of new folks trying install MediaWiki... wiki spam remains a big problem.
Halz
On Wed, May 22, 2013 at 10:07 AM, halz halz_antispam@yahoo.co.uk wrote:
If you've ever set up a fresh MediaWiki and tried to leave it open to editing, you'll know about the problem of wiki spam. There's various well documented tricks to tackle the problem on your own wiki (although it seems to me that some of these are becoming less effective over time. Particularly reCaptcha)
But wiki spammers are behaving in a staggeringly inconsiderate and anti-social way, and I've often thought we should explore stronger ways of delivering some fight back.
Looking at spam across many wikis e.g. by googling "mediawiki ugg boots" we could do more internet-wide spam cleanup somehow.
But recently I came across something, actually by looking at one the spam links. Take a look at this: sickseo.co.uk/off-page-seo.html This video shows the use of a tool called 'SENuke Xcr' and another one called 'Ultimate Demon' to perform "Off site SEO" ...that's "spamming" to you and me.
I've always known spammers used tools like this, but seeing this instructional video gives me new insights into what we're up against. Also see the discussion taking place on forum.edwinsoft.com I find it amazing how oblivious these people seem to be, to how annoying their activities are for people running websites. Never do they mention the word spam, or have an inkling that they may be doing something ethically questionable.
Do you think we should try to contact them and explain that they are behaving badly? Maybe on these forums. Maybe we'd have to spam them back repeatedly with such messages as they get removed by the admins. And on youtube do you think we can get videos like this removed? We can at least comment on them and vote them down (Youtube finds quite a lot of similar videos) How about unleashing a bit of "ethical hacking" e.g. DDOS attacks on people distributing this software? Really I'm amazed at how they're getting away with this spamming out in the open these days.
Please do make sure you know the laws of where you live, and stay within them. They may be a pain, but please don't break the law :)
Personally, I would recommend taking a look at how some of these tools work and find ways to explicitly detect and block them. At one point we were able to trip up a few of them, but I don't personally have the time right now to reverse engineer them and look for signatures. But if someone wants to do it, I'd gladly point you in the right direction.
Halz http://www.mediawiki.org/wiki/User:Halz
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
In article 1369242450.63126.YahooMailNeo@web172605.mail.ir2.yahoo.com, halz halz_antispam@yahoo.co.uk writes:
If you've ever set up a fresh MediaWiki and tried to leave it open to editing, you'll know about the problem of wiki spam. There's various well documented tricks to tackle the problem on your own wiki (although it seems to me that some of these are becoming less effective over time. Particularly reCaptcha)
I thought the standard practice was to require admin approval of new accounts and require new accounts to fill out a profile for their user page.
Is your problem that you want to open your wiki to a broader base of people that you don't want to personally vett the users, I'm not sure how to deal with that.
On Wed, 22 May 2013 14:44:10 -0700, Richard legalize@xmission.com wrote:
I thought the standard practice was to require admin approval of new accounts and require new accounts to fill out a profile for their user page.
No it's not.
In article op.wxido1sjdykrql@daniels-macbook-air.local, "Daniel Friesen" daniel@nadir-seen-fire.com writes:
On Wed, 22 May 2013 14:44:10 -0700, Richard legalize@xmission.com wrote:
I thought the standard practice was to require admin approval of new accounts and require new accounts to fill out a profile for their user page.
No it's not.
OK, now that you've told me what it's not, would you mind telling me what is standard practice?
The main reason that I haven't opened my wiki up to broader open registration and editing is exactly because I don't want my wiki overrun with spam.
Am 22.05.2013 19:07, schrieb halz:
If you've ever set up a fresh MediaWiki and tried to leave it open to editing, you'll know about the problem of wiki spam. There's various well documented tricks to tackle the problem on your own wiki (although it seems to me that some of these are becoming less effective over time. Particularly reCaptcha)
Im using ConfirmEdit with QuestyCaptcha to restrict account-creation, and AbuseFilter to filter anonymous edits. Works for me, we had about 2 Spam-Edits since I configured AbuseFilter (more than a year ago).
Do you think we should try to contact them and explain that they are behaving badly? Maybe on these forums. Maybe we'd have to spam them back repeatedly with such messages as they get removed by the admins.
I wouldnt recommend trying to out-spam the spammers. Chances are that they are better at this.
And on youtube do you think we can get videos like this removed?
You can try. Wouldnt count on it though.
We can at least comment on them and vote them down (Youtube finds quite a lot of similar videos)
Please dont do that. All thats gonna accomplish is that the videos will rise in the search rankings (yes, even the downrating does that).
Greetings Stip
On 22/05/13 19:07, halz wrote:
Do you think we should try to contact them and explain that they are behaving badly? Maybe on these forums. Maybe we'd have to spam them back repeatedly with such messages as they get removed by the admins.
I don't think that would be effective.
And on youtube do you think we can get videos like this removed?
Feel free to try.
We can at least comment on them and vote them down (Youtube finds quite a lot of similar videos) How about unleashing a bit of "ethical hacking" e.g. DDOS attacks on people distributing this software?
You can go to jail for that, in a lot of places. Even if they DDoS'd you first, and you DDoS'd them back, you'd be more likely to go to jail than them, because being an amateur, you'd be more likely to slip up in some traceable way.
I think the only effective way to deal with wiki spam would be to reduce the profit margin. That means either making it more expensive, say with technical anti-spam tools, or making the revenue smaller, by getting their sites delisted from Google, and by finding legal ways to reduce direct revenue sources like pay-per-click search engine affiliate schemes.
From the experience with email spam, we can expect that if some
organisation became successful in this goal of reducing the effectiveness of wiki and blog spam, that organisation would come under attack, by DDoS and other means. So it's not a task for the faint-hearted.
-- Tim Starling
While not everybody is able to reduce their profit margins as Christ and Tim said, everyone running a MediaWiki wiki can experiment and document best practices. MediaWiki already has many defense tools, but they're often unknown or hard to use (as this very thread shows). One site I reached from the link in the original post sells at 40 $/6 months a list of a couple thousands wikis which have no captcha at all on registration and a few hundreds which don't have rel=nofollow... The owners of those wikis need some better reading. A few days ago I refactored/updated https://www.mediawiki.org/wiki/Manual:Combating_spam ; help is needed to coordinate it better with https://www.mediawiki.org/wiki/Manual:Combating_vandalism . My main question is whether IP blacklists help stop all those proxies with dozens thousands ever-changing IPs sold for spammers' use, or are just a CPU sink. On CAPTCHAs, we already know that FancyCaptcha is useless, but it's not clear what to do. A tour I did across 500 wikis some time ago seemed to show that QuestyCaptcha is vastly superior to the other options, for the average wiki. https://www.mediawiki.org/wiki/Thread:Extension_talk:ConfirmEdit/Wikis_account_registration_tour If confirmed, it could be made the default in the installer, which could also make the user set custom questions in the install process itself and encourage frequent update of the questions.
Nemo
I am using already for quite a while QuestyCaptcha http://www.mediawiki.org/wiki/Extension:QuestyCaptcha
It allows you to ask a number of questions which will randomly change.
You could use for instance capitals of countries or states. It works like a charm for me. None of the spammers ever got through.
At 11:36 AM 5/24/2013 Friday +0200, you wrote:
While not everybody is able to reduce their profit margins as Christ and Tim said, everyone running a MediaWiki wiki can experiment and document best practices. MediaWiki already has many defense tools, but they're often unknown or hard to use (as this very thread shows). One site I reached from the link in the original post sells at 40 $/6 months a list of a couple thousands wikis which have no captcha at all on registration and a few hundreds which don't have rel=nofollow... The owners of those wikis need some better reading. A few days ago I refactored/updated https://www.mediawiki.org/wiki/Manual:Combating_spam ; help is needed to coordinate it better with https://www.mediawiki.org/wiki/Manual:Combating_vandalism . My main question is whether IP blacklists help stop all those proxies with dozens thousands ever-changing IPs sold for spammers' use, or are just a CPU sink. On CAPTCHAs, we already know that FancyCaptcha is useless, but it's not clear what to do. A tour I did across 500 wikis some time ago seemed to show that QuestyCaptcha is vastly superior to the other options, for the average wiki. https://www.mediawiki.org/wiki/Thread:Extension_talk:ConfirmEdit/Wikis_account_registration_tour If confirmed, it could be made the default in the installer, which could also make the user set custom questions in the install process itself and encourage frequent update of the questions. Nemo _______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l </x-flowed>
_ _ (o) (o) oOOO----(_)----OOOo--- Henny (Lee Hae Kang) ----------------------------- http://www.henny-savenije.pe.kr Portal to all my sites http://www.hendrick-hamel.henny-savenije.pe.kr (in English) Feel free to discover Korea with Hendrick Hamel (1653-1666) http://www.hendrick-hamel.henny-savenije.pe.kr/indexk2.htm In Korean http://www.hendrick-hamel.henny-savenije.pe.kr/Dutch In Dutch http://www.vos.henny-savenije.pe.kr Frits Vos Article about Witsen and Eibokken and his first Korean-Dutch dictionary http://www.cartography.henny-savenije.pe.kr (in English) Korea through Western Cartographic eyes http://www.hwasong.henny-savenije.pe.kr Hwasong the fortress in Suwon http://www.oldKorea.henny-savenije.pe.kr Old Korea in pictures http://www.british.henny-savenije.pe.kr A British encounter in Pusan (1797) http://www.genealogy.henny-savenije.pe.kr/ Genealogy http://www.henny-savenije.pe.kr/phorum Bulletin board for Korean studies
Maybe mediawiki sites can unite to keep a global list of these IP's and block them as soon as they are submitted. Each mediawiki site can auto-submit a spammer IP as soon as it's discovered to the global list. What are the problems with this idea?
Al
More brainstorming ideas about this...
A centralized DB would require a site requiring heavy use since it would be hit for each edit submission from every participating site. An alternative would be a subscriber model where the global list (or updates) is downloaded periodically by subscribers.
How long to block an IP? I'm not an expert in this area, but what if the IP is blocked for an extended period of time, such as ONE YEAR. If that IP ends up being reissued to a legitimate ISP customer and he gets blocked, that will alert that user that his ISP is either participating in spamming or is tolerating spammers user of their IPs which is affecting regular customers. This may sound harsh, but this could force ISPs to address the problem with their spammer customers. And, there is still a workaround for the legit user who can usually just reboot their modem to get a new IP address.
Just thinking out load... al
________________________________ From: Al Johnson alj62888@yahoo.com To: MediaWiki announcements and site admin list mediawiki-l@lists.wikimedia.org Sent: Friday, May 24, 2013 2:41 PM Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
Maybe mediawiki sites can unite to keep a global list of these IP's and block them as soon as they are submitted. Each mediawiki site can auto-submit a spammer IP as soon as it's discovered to the global list. What are the problems with this idea?
Al _______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
One thing that might work (wouldnt be 100%) would be a method for identifying IP ranges of know abuse where legit collateral is minimal and keeping a database of these and auto-blocking them.
These IPs are typically those of: * known web-hosting services * Proxy Services * data-center providers * similar cases where risk of legit end user usage is minimal.
I know working with anti-spam efforts that there are several know ISP's that 99% of their traffic is spam. If something like this is implemented it could reduce spam by as much as 80% or more.
On Fri, May 24, 2013 at 5:19 PM, Al Johnson alj62888@yahoo.com wrote:
More brainstorming ideas about this...
A centralized DB would require a site requiring heavy use since it would be hit for each edit submission from every participating site. An alternative would be a subscriber model where the global list (or updates) is downloaded periodically by subscribers.
How long to block an IP? I'm not an expert in this area, but what if the IP is blocked for an extended period of time, such as ONE YEAR. If that IP ends up being reissued to a legitimate ISP customer and he gets blocked, that will alert that user that his ISP is either participating in spamming or is tolerating spammers user of their IPs which is affecting regular customers. This may sound harsh, but this could force ISPs to address the problem with their spammer customers. And, there is still a workaround for the legit user who can usually just reboot their modem to get a new IP address.
Just thinking out load... al
From: Al Johnson alj62888@yahoo.com To: MediaWiki announcements and site admin list < mediawiki-l@lists.wikimedia.org> Sent: Friday, May 24, 2013 2:41 PM Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
Maybe mediawiki sites can unite to keep a global list of these IP's and block them as soon as they are submitted. Each mediawiki site can auto-submit a spammer IP as soon as it's discovered to the global list. What are the problems with this idea?
Al _______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l _______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
That sounds like a good idea, John.
Also, to elaborate on my early proposed idea for this master spam IP list, I'm a fan of having an invitation only wiki website where vetted users can submit known spammer IPs and IP ranges to a community IP list that can be imported into any MediaWiki wiki.
I'm not sure how this proposed wiki would be structured (i.e. - whether it would be a regular wiki, using Semantic MediaWiki, etc.), but I would be interested in a wiki dedicated to fighting wiki spam where vetted users can create IP blacklists for spammers, swap information on known spammers and ways to combat them, and in the interests of security, it may be necessary to restrict page viewing to confirmed members only.
Past that, I don't have too many more ideas I can submit for this brainstorming session, but some form of master anti-spammer IP list in some usable form sounds like a great idea in any event.
Date: Fri, 24 May 2013 17:48:02 -0400 From: phoenixoverride@gmail.com To: alj62888@yahoo.com; mediawiki-l@lists.wikimedia.org Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
One thing that might work (wouldnt be 100%) would be a method for identifying IP ranges of know abuse where legit collateral is minimal and keeping a database of these and auto-blocking them.
These IPs are typically those of:
- known web-hosting services
- Proxy Services
- data-center providers
- similar cases where risk of legit end user usage is minimal.
I know working with anti-spam efforts that there are several know ISP's that 99% of their traffic is spam. If something like this is implemented it could reduce spam by as much as 80% or more.
On Fri, May 24, 2013 at 5:19 PM, Al Johnson alj62888@yahoo.com wrote:
More brainstorming ideas about this...
A centralized DB would require a site requiring heavy use since it would be hit for each edit submission from every participating site. An alternative would be a subscriber model where the global list (or updates) is downloaded periodically by subscribers.
How long to block an IP? I'm not an expert in this area, but what if the IP is blocked for an extended period of time, such as ONE YEAR. If that IP ends up being reissued to a legitimate ISP customer and he gets blocked, that will alert that user that his ISP is either participating in spamming or is tolerating spammers user of their IPs which is affecting regular customers. This may sound harsh, but this could force ISPs to address the problem with their spammer customers. And, there is still a workaround for the legit user who can usually just reboot their modem to get a new IP address.
Just thinking out load... al
From: Al Johnson alj62888@yahoo.com To: MediaWiki announcements and site admin list < mediawiki-l@lists.wikimedia.org> Sent: Friday, May 24, 2013 2:41 PM Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
Maybe mediawiki sites can unite to keep a global list of these IP's and block them as soon as they are submitted. Each mediawiki site can auto-submit a spammer IP as soon as it's discovered to the global list. What are the problems with this idea?
Al _______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l _______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
There are already numerous blacklist IP management sites. For example: http://www.projecthoneypot.org/
There are already numerous corresponding work-around spammer management sites to avoid the IP blacklists.
--Hiram
On 5/24/13 3:16 PM, Arcane 21 wrote:
That sounds like a good idea, John.
We have two issues here, Blocking said IPs in a reliable simple method. * Thought would be something like torblock that just downloads the list on a daily/weekly/whatever schedule * can provide the blocked users information on why they are blocked and appeal process for valid users (ipblockexempt)
Two: A method for reporting, confirming and listing said sources. * The easiest method that I know of would be a category based system with a bot/script that compiles a complete listing of said pages in the category on a regular basis. * Using a category system enables a more fulidic process, reduces edit conflicts and provides several other options.
Another issue is the effort that would be required for a centralized DB/authority model. At the moment, I'm leaning towards a P2P model.
A pseudo-P2P system can be implemented w/o a complicated full-fledged P2P infrastructure. Peer wikis can simply update their own list of IPs with those from a few other participating wikis from which they update their lists and so new entries will propagate as quickly as participating wiki update their lists. The source wiki of the IP block can be included in the record. Querying a good set of peers may be tricky to ensure a complete list, but a solution to that may be found in the P2P literature.
Another issue: Responsiveness. Spammers may hit many wikis at once using a given IP. The ability for wikis to get an updated (incremental) list within minutes might be important.
________________________________ From: John phoenixoverride@gmail.com To: MediaWiki announcements and site admin list mediawiki-l@lists.wikimedia.org Sent: Friday, May 24, 2013 4:29 PM Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
We have two issues here, Blocking said IPs in a reliable simple method. * Thought would be something like torblock that just downloads the list on a daily/weekly/whatever schedule * can provide the blocked users information on why they are blocked and appeal process for valid users (ipblockexempt)
Two: A method for reporting, confirming and listing said sources. * The easiest method that I know of would be a category based system with a bot/script that compiles a complete listing of said pages in the category on a regular basis. * Using a category system enables a more fulidic process, reduces edit conflicts and provides several other options. _______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
In article 1369437337.9904.YahooMailNeo@web161904.mail.bf1.yahoo.com, Al Johnson alj62888@yahoo.com writes:
Another issue is the effort that would be required for a centralized DB/authority model.
They already exist. It's called Spamhaus. What's wrong with enhancing this extension: http://www.mediawiki.org/wiki/Extension:Check_Spambots
Hi Richard,
I don't know. It seems like there are still problems with spam, but maybe that perception is incorrect. Are you saying that there are already good solutions (for wikis) which do not require many man-hours and dedicated staff to maintain and that some wiki admins just haven't stumbled upon yet? The Check Spambots extension is still in beta so I am wondering what successful extensions have been used previously... or is that extension the answer to all of our concerns? Could the wiki community not be helped with a more specialized form of spam lists? After all, there are many kinds of spam lists anyways.
Is there really nothing more that can be done?
________________________________ From: Richard legalize@xmission.com To: Al Johnson alj62888@yahoo.com; MediaWiki announcements and site admin list mediawiki-l@lists.wikimedia.org Sent: Friday, May 24, 2013 5:34 PM Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
In article 1369437337.9904.YahooMailNeo@web161904.mail.bf1.yahoo.com, Al Johnson alj62888@yahoo.com writes:
Another issue is the effort that would be required for a centralized DB/authority model.
They already exist. It's called Spamhaus. What's wrong with enhancing this extension: http://www.mediawiki.org/wiki/Extension:Check_Spambots
[Can we please do a little trim-and-quote instead of top-posting? It's making it hard to follow the conversation.]
I wrote:
They already exist. It's called Spamhaus. What's wrong with enhancing this extension: http://www.mediawiki.org/wiki/Extension:Check_Spambots
In article 1369440421.38235.YahooMailNeo@web161906.mail.bf1.yahoo.com, Al Johnson alj62888@yahoo.com writes:
I don't know. It seems like there are still problems with spam, but maybe that perception is incorrect.
Since I require users to be logged-in to edit and I don't have open automatic account creation, I have zero problems with respect to spam. I was planning on allowing account creation requests with admin approval to open up to the next level of user participation, but I would never open up my wiki to anonymous edits or account creation without some form of review.
Is there really nothing more that can be done?
If you aren't already doing the things listed here: http://www.mediawiki.org/wiki/Manual:Combating_spam
Then it seems obvious that is where you should start.
If you're saying that you're already doing everything listed there and you're still overwhelmed with spam, then I'd say that new techniques need to be developed or existing techniques for combatting spam improved.
________________________________ From: Richard legalize@xmission.com To: Al Johnson alj62888@yahoo.com; MediaWiki announcements and site admin list mediawiki-l@lists.wikimedia.org Sent: Friday, May 24, 2013 6:44 PM Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
I was planning on allowing account creation requests with admin approval to open up to the next level of user participation, but I would never open up my wiki to anonymous edits or account creation without some form of review.
I think the concerns in this thread are mostly for those who have open wikis. Open wikis are sometimes needed for new wikis that are trying to build a user base by encouraging participation without having to go through the registration process. Of course you aren't going to have much of a spam problem if you require admin approved registration.
I would love to explore how WikiApiary [1] could help with this. I've started to work on pulling in new user logs from remote wikis. I would be happy to brainstorm in a group on how we could move beyond just IP lists to something more sophisticated.
One thought I have had is having WikiApiary use a bot account on remote wikis that wish to participate to fight spammers, revert changes, ban them, etc.
-- Jamie Thingelstad jamie@thingelstad.com mobile: 612-810-3699
On May 24, 2013, at 5:16 PM, Arcane 21 arcane@live.com wrote:
That sounds like a good idea, John.
Also, to elaborate on my early proposed idea for this master spam IP list, I'm a fan of having an invitation only wiki website where vetted users can submit known spammer IPs and IP ranges to a community IP list that can be imported into any MediaWiki wiki.
I'm not sure how this proposed wiki would be structured (i.e. - whether it would be a regular wiki, using Semantic MediaWiki, etc.), but I would be interested in a wiki dedicated to fighting wiki spam where vetted users can create IP blacklists for spammers, swap information on known spammers and ways to combat them, and in the interests of security, it may be necessary to restrict page viewing to confirmed members only.
Past that, I don't have too many more ideas I can submit for this brainstorming session, but some form of master anti-spammer IP list in some usable form sounds like a great idea in any event.
Date: Fri, 24 May 2013 17:48:02 -0400 From: phoenixoverride@gmail.com To: alj62888@yahoo.com; mediawiki-l@lists.wikimedia.org Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
One thing that might work (wouldnt be 100%) would be a method for identifying IP ranges of know abuse where legit collateral is minimal and keeping a database of these and auto-blocking them.
These IPs are typically those of:
- known web-hosting services
- Proxy Services
- data-center providers
- similar cases where risk of legit end user usage is minimal.
I know working with anti-spam efforts that there are several know ISP's that 99% of their traffic is spam. If something like this is implemented it could reduce spam by as much as 80% or more.
On Fri, May 24, 2013 at 5:19 PM, Al Johnson alj62888@yahoo.com wrote:
More brainstorming ideas about this...
A centralized DB would require a site requiring heavy use since it would be hit for each edit submission from every participating site. An alternative would be a subscriber model where the global list (or updates) is downloaded periodically by subscribers.
How long to block an IP? I'm not an expert in this area, but what if the IP is blocked for an extended period of time, such as ONE YEAR. If that IP ends up being reissued to a legitimate ISP customer and he gets blocked, that will alert that user that his ISP is either participating in spamming or is tolerating spammers user of their IPs which is affecting regular customers. This may sound harsh, but this could force ISPs to address the problem with their spammer customers. And, there is still a workaround for the legit user who can usually just reboot their modem to get a new IP address.
Just thinking out load... al
From: Al Johnson alj62888@yahoo.com To: MediaWiki announcements and site admin list < mediawiki-l@lists.wikimedia.org> Sent: Friday, May 24, 2013 2:41 PM Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
Maybe mediawiki sites can unite to keep a global list of these IP's and block them as soon as they are submitted. Each mediawiki site can auto-submit a spammer IP as soon as it's discovered to the global list. What are the problems with this idea?
Al _______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l _______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
I run a spam/virus filtering email relay for some clients and I agree with most of what Richard says:
Everything that's being discussed has already been done to combat email spam. It seems the appropriate thing to do is to leverage that work instead of re-inventing it.
There is also Akismet which Wordpress users can use. I'm sure there is a lot of work that has been put into blocking spam in blog comments and email that overlaps.
I recently discovered a test wiki I had set up and forgotten about had been overrun with spam and started using it to watch the spammers and record their "work".
From these observations, I think I like Jamie Thingelstad's idea (partly
because, yes, it would persuade people to make sure their wiki is registered at WikiApiary):
On 05/24/2013 07:58 PM, Jamie Thingelstad wrote:
One thought I have had is having WikiApiary use a bot account on remote wikis that wish to participate to fight spammers, revert changes, ban them, etc.
If you could also collect urls added in reverted edits, it seems like those would be good candidates for a shared blacklist.
Mark.
When Richard says "It seems the appropriate thing to do is to leverage that work instead of re-inventing it."
I take that to mean simply installing all the spam-fighting services, extensions, daemons, etc.. that already exist. Or, perhaps, creating a monster bundle with an integrated assortment of spam-fighting tools. I think it's OK to brainstorm new approaches.
________________________________ From: Mark A. Hershberger mah@everybody.org To: MediaWiki announcements and site admin list mediawiki-l@lists.wikimedia.org Sent: Saturday, May 25, 2013 3:56 AM Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
I run a spam/virus filtering email relay for some clients and I agree with most of what Richard says:
Everything that's being discussed has already been done to combat email spam. It seems the appropriate thing to do is to leverage that work instead of re-inventing it.
There is also Akismet which Wordpress users can use. I'm sure there is a lot of work that has been put into blocking spam in blog comments and email that overlaps.
I recently discovered a test wiki I had set up and forgotten about had been overrun with spam and started using it to watch the spammers and record their "work".
From these observations, I think I like Jamie Thingelstad's idea (partly because, yes, it would persuade people to make sure their wiki is registered at WikiApiary):
On 05/24/2013 07:58 PM, Jamie Thingelstad wrote:
One thought I have had is having WikiApiary use a bot account on remote wikis that wish to participate to fight spammers, revert changes, ban them, etc.
If you could also collect urls added in reverted edits, it seems like those would be good candidates for a shared blacklist.
Mark.
In article 1369525016.63923.YahooMailNeo@web161904.mail.bf1.yahoo.com, Al Johnson alj62888@yahoo.com writes:
When Richard says "It seems the appropriate thing to do is to leverage that work instead of re-inventing it."
I take that to mean simply installing all the spam-fighting services, extensions, daemons, etc.. that already exist. Or, perhaps, creating a monster bundle with an integrated assortment of spam-fighting tools. I think it's OK to brainstorm new approaches.
I also mean it to include enhancing th existing extensions.
Extend what exists already before you go writing something from scratch.
Of course, thanks. Brainstorming is just a very early process for exploration and understanding. I'm just glad to see plenty of people eager to solve a tough problem or at least think about new approaches to it.
________________________________ From: Richard legalize@xmission.com To: mediawiki-l@lists.wikimedia.org Sent: Saturday, May 25, 2013 7:37 PM Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
In article 1369525016.63923.YahooMailNeo@web161904.mail.bf1.yahoo.com, Al Johnson alj62888@yahoo.com writes:
When Richard says "It seems the appropriate thing to do is to leverage that work instead of re-inventing it."
I take that to mean simply installing all the spam-fighting services, extensions, daemons, etc.. that already exist. Or, perhaps, creating a monster bundle with an integrated assortment of spam-fighting tools. I think it's OK to brainstorm new approaches.
I also mean it to include enhancing th existing extensions.
Extend what exists already before you go writing something from scratch.
In article 1369536203.77243.YahooMailNeo@web161902.mail.bf1.yahoo.com, Al Johnson alj62888@yahoo.com writes:
Of course, thanks. Brainstorming is just a very early process for exploration and understanding. I'm just glad to see plenty of people eager to solve a tough problem or at least think about new approaches to it.
I also think it prudent to install all existing extensions for combatting spam in order to verify that whatever problems you're seeing aren't already addressed by existing extensions.
On Sat, May 25, 2013 at 2:56 AM, Mark A. Hershberger mah@everybody.org wrote:
I run a spam/virus filtering email relay for some clients and I agree with most of what Richard says:
Everything that's being discussed has already been done to combat email spam. It seems the appropriate thing to do is to leverage that work instead of re-inventing it.
There is also Akismet which Wordpress users can use. I'm sure there is a lot of work that has been put into blocking spam in blog comments and email that overlaps.
I recently discovered a test wiki I had set up and forgotten about had been overrun with spam and started using it to watch the spammers and record their "work".
From these observations, I think I like Jamie Thingelstad's idea (partly because, yes, it would persuade people to make sure their wiki is registered at WikiApiary):
On 05/24/2013 07:58 PM, Jamie Thingelstad wrote:
One thought I have had is having WikiApiary use a bot account on remote wikis that wish to participate to fight spammers, revert changes, ban them, etc.
If you could also collect urls added in reverted edits, it seems like those would be good candidates for a shared blacklist.
The SpamBlacklist list of banned urls is pulled by default (iirc, although if not there is an example of pointing to the list somewhere I can dig up) from http://meta.wikimedia.org/wiki/Spam_blacklist. So everyone is free to benefit from that list, which is regularly updated. But yeah, ideally it would be great to have the ability for any wiki to be able to add links back to the list automatically, and have the links ranked based on some trust metric.
Mark.
Imagination does not breed insanity. Exactly what does breed insanity is reason. Poets do not go mad; but chess-players do. -- G.K. Chesterson
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Two connected suggestions:
1) Login verification via SMS to a mobile phone.
As used by Google and recently announced by Twitter: https://blog.twitter.com/2013/getting-started-login-verification
Though I am not sure what the financial implications of this would be.
2) Social login - encouraging users to login with their Facebook, Twitter, Gmail accounts which have already had some sort of Login Verification or are connected to a "social graph" that can be checked.
Paul
On Sun, 26 May 2013 02:00:57 -0700, paul youlten paul.youlten@gmail.com wrote:
Two connected suggestions:
- Login verification via SMS to a mobile phone.
As used by Google and recently announced by Twitter: https://blog.twitter.com/2013/getting-started-login-verification
Though I am not sure what the financial implications of this would be.
Sorry. But besides the financial implication most people who come to your wiki will NOT want to hand over their personal phone number to you. And many may not even have one. You will chase away a large percentage of the people you are trying to attract by having an open wiki.
- Social login - encouraging users to login with their Facebook,
Twitter, Gmail accounts which have already had some sort of Login Verification or are connected to a "social graph" that can be checked.
Spambots can get social accounts too. They also regularly get "friends" too.
Paul
Two connected suggestions:
- Login verification via SMS to a mobile phone.
As used by Google and recently announced by Twitter: https://blog.twitter.com/2013/getting-started-login-verification
Though I am not sure what the financial implications of this would be.
Sorry. But besides the financial implication most people who come to your wiki will NOT want to hand over their personal phone number to you. And many may not even have one. You will chase away a large percentage of the people you are trying to attract by having an open wiki.
If Twitter, Facebook, Yahoo and Gmail are all using SMS verification I think that it must have some merit in the battle against spammers and fake accounts.
The problem of managing an open wiki means is that without almost constant management it gets filled with spam which stops people reading it, this also discourages people from editing it, which means that founders give up. What we are looking for is a balance.
Clearly people who come simply to read a wiki won't want to hand over their mobile phone numbers, but maybe the few that feel inspired to edit the wiki would.
How many of us would object to giving Wikipedia our mobile phone numbers in order to help prevent spam? or WIkia? It is all about interest and passion. I am sure that even the furries who edit Wikifur would be happy to give up their a mobile phone number inorder to edit pages.
The challenge is for new and smaller wikis. Maybe the MediaWiki Foundation could help with the cost of sending SMS messages to potential editors?
- Social login - encouraging users to login with their Facebook,
Twitter, Gmail accounts which have already had some sort of Login Verification or are connected to a "social graph" that can be checked.
Spambots can get social accounts too. They also regularly get "friends" too.
But far fewer than with open access.
How easily can spammers make multiple usernames/accounts from Google where an SMS PIN code is required to activate an account?
This makes interesting reading:
1) Disable new signups or if you think that is too extreme, install SecurePages 2) Install SimpleAntiSpam 3) Install SpamBlacklist and TitleBlacklist 4) Allow anonymous edits 5) Always block the IP addresses that spam is posted from 6) Install User Merge and Delete and use that to clear out the existing spammer accounts.
#1 is the most important step. It's easy for spammers to create throwaway accounts. A CAPTCHA makes only a small difference, not worth the extra bandwidth cost for the images. The hundreds of throwaway accounts are almost as big a problem as the spam postings.
#2 reduces the volume of spam by at least 1/3. The only robots that get past SimpleAntiSpam are those specially designed for MediaWiki, not the ones that fill in all textareas in every web page everywhere. Similarly if your site has SSL, SecurePages (or its predecessor HttpsLogin) thwarts some bots that don't have SSL support.
#3 will stop you getting the same spam posting (or variants of it) repeatedly. If you update the blacklist regularly that should reduce the volume of spam by another 10-20%. And remember the spammers will run out of paying customers (you eliminate one for every domain you block links to) long before they run out of public proxies/zombies to post from.
#4 does not increase the volume of spam as much as you might expect. There's a popular MediaWiki-spamming bot that never attempts to post anonymously - it gives up when it cannot find the "create account" link. And if you don't do this, you don't have a wiki anymore (you just have a static website using MediaWiki as a CMS.) There is a small bonus - it makes it easier to find (and block) the spammers' IP addresses. Of course you can get the IP addresses using CheckUser or by reading the database directly, but it's much easier when the IP address is in plain sight.
#5 is the least effective measure, but it's still worth doing. Spammers do re-use IP addresses. They may be cheap but they are not infinite, and sometimes you will catch one of those runaway robots that posts a spam page every 5 minutes.
#6 doesn't prevent spam, but it allows you to clean up your user list page once you have other anti-spam measures in place.
http://stackoverflow.com/questions/6748633/how-to-prevent-mediawiki-spam
Should the extensions mentioned be incorporated into the main MW package rather than being added later?
Paul
In article CANtY4rYMPXCRoyPkHXtXYmKgSwevN6dfaZEFLo+bfOJxupWhuA@mail.gmail.com, paul youlten paul.youlten@gmail.com writes:
- Allow anonymous edits
And if you don't do this, you don't have a wiki anymore (you just have a static website using MediaWiki as a CMS.)
Since when is being a wiki equated with anonymous edits?
In article CANtY4rZpQJZHJaELUvCbnZhzo4+mMFDevxw4ZE2ZGZJFwgBtqQ@mail.gmail.com, paul youlten paul.youlten@gmail.com writes:
If Twitter, Facebook, Yahoo and Gmail are all using SMS verification I think that it must have some merit in the battle against spammers and fake accounts.
LOL! They want your phone number because they build marketing profiles from your "free" account. They could care less about spammers in this regard.
I disagree with your comment below in principle. Requiring regular users to modify their behavior because of the actions of a few bad actors just ruins and complicates the experience you are trying to create. Dealing with spammers comes with the territory and is my responsibility, not my visitors.
________________________________ From: paul youlten paul.youlten@gmail.com To: MediaWiki announcements and site admin list mediawiki-l@lists.wikimedia.org Sent: Sunday, May 26, 2013 10:16 AM Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
[snip]
How many of us would object to giving Wikipedia our mobile phone numbers in order to help prevent spam? or WIkia? It is all about interest and passion. I am sure that even the furries who edit Wikifur would be happy to give up their a mobile phone number inorder to edit pages.
On Sun, May 26, 2013 at 9:33 PM, Al Johnson alj62888@yahoo.com wrote:
I disagree with your comment below in principle. Requiring regular users to modify their behavior because of the actions of a few bad actors just ruins and complicates the experience you are trying to create. Dealing with spammers comes with the territory and is my responsibility, not my visitors.
...but the sad truth is that for many small wikis the actions of a few bad actors ruin and complicate the experience you are trying to create for users (readers as much as editors) The spammers spoil the user experience much more than asking a few interested users to jump through a hoop or two to become editors.
* ________________________________ * From: paul youlten paul.youlten@gmail.com * To: Al Johnson alj62888@yahoo.com; MediaWiki announcements and site admin list mediawiki-l@lists.wikimedia.org * Sent: Sunday, May 26, 2013 1:51 PM * Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback. * On Sun, May 26, 2013 at 9:33 PM, Al Johnson alj62888@yahoo.com wrote: * > I disagree with your comment below in principle. Requiring regular users to modify their behavior because of the actions of a few bad actors just ruins and complicates the experience you are trying to create. Dealing with spammers comes with the territory and is my responsibility, not my visitors. * > * ...but the sad truth is that for many small wikis the actions of a few * bad actors ruin and complicate the experience you are trying to create * for users (readers as much as editors) The spammers spoil the user * experience much more than asking a few interested users to jump * through a hoop or two to become editors.
Asking the visitor to do any more than notifying admin of a problem is probably too much (unless they want to correct the spam themselves). I'm afraid you'll get too much resistance if you start demanding phone numbers. Many small wikis are also less secure and the chance of phone numbers getting stolen is too big. It's a whole 'nother can-o-worms.
I would love to explore how WikiApiary [1] could help with this. I've started to work on pulling in new user logs from remote wikis. I would be happy to brainstorm in a group on how we could move beyond just IP lists to something more sophisticated. One thought I have had is having WikiApiary use a bot account on remote wikis that wish to participate to fight spammers, revert changes, ban them, etc.
I was thinking along these lines too, although rather than expecting wiki administrators to pro-actively "wish to participate", I was thinking it would be good to have a bot which I could unleash on wikis where I've found spam is taking over (there are many!). This bot could check incoming edits and revert them, and/or do mass clean-up of existing spam.
Do clean-up across lots of wikis and that would possibly deal a big blow to spammers. If you watch that 'ultimate demon' video again, they're building a 1st and 2nd tear pyramid of links across many wikis and blogs. So this is the thing. If you stop spam on one wiki, you're just a minor glitch in their operations. Wipe out their spam from all over the web (as wikis uniquely allow us to do actually) and they might start to view wikis as a less desirable target.
Before developing a de-spamming bot, an easier step might be just to bring things together into a shared recent changes view, to bring some cross-wiki awareness. I hadn't heard of wikiapiary.com, but it looks like it could be a helpful part of this. And a simple of extension of that might be to contact (automatically?) the admins of wikis where spam is flooding in.
Halz
Halz, in my experience this is the list of spam-flooded wikis http://wikistats.wmflabs.org/display.php?t=mw&s=ausers_desc: there's no way a wiki with a couple admins and a few hundreds articles can have thousands active users, unless they're all spammers.
Another pattern in spammed wikis is that their dumps compress very well with 7z, e.g. one talk page here compressed about 5000 times: https://archive.org/details/wiki-wikicafe.metacafe.com_en
I agree that knowing what's happening out there is the first step, so something useful to do is compiling a list of wikis to have more examples like these: https://code.google.com/p/wikiteam/issues/detail?id=59 Of the 20k wikis in Pavlo's list of 2009, 75 % are dead now. Spam was probably a component of their death. Actually, spam is probably the first reason people don't create or keep MediaWiki sites.
Nemo
In article CAP-JHpmJc=vLFuyfeTf2iKTn4j43PHm9QnaDDhVVypz3MVzB-A@mail.gmail.com, John phoenixoverride@gmail.com writes:
One thing that might work (wouldnt be 100%) would be a method for identifying IP ranges of know abuse where legit collateral is minimal and keeping a database of these and auto-blocking them.
The problem with all these schemes of identifying perpetrators is that they often operate through botnets and the IP address doing the edit has nothing at all to do with the perpetrator.
In the very slim event that legit wiki editor would also happen to have had his IP previously used by a malicious botnet, wouldn't a "IP blocked" message simply inform him that his computer has been compromised? It seems like the collateral damage would still be very very small. Also, related, wiki spam is usually reviewed by human eyes and is less error-prone.
________________________________ From: Richard legalize@xmission.com To: MediaWiki announcements and site admin list mediawiki-l@lists.wikimedia.org; John phoenixoverride@gmail.com Sent: Friday, May 24, 2013 4:49 PM Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
In article CAP-JHpmJc=vLFuyfeTf2iKTn4j43PHm9QnaDDhVVypz3MVzB-A@mail.gmail.com, John phoenixoverride@gmail.com writes:
One thing that might work (wouldnt be 100%) would be a method for identifying IP ranges of know abuse where legit collateral is minimal and keeping a database of these and auto-blocking them.
The problem with all these schemes of identifying perpetrators is that they often operate through botnets and the IP address doing the edit has nothing at all to do with the perpetrator.
In article 1369438008.34960.YahooMailNeo@web161901.mail.bf1.yahoo.com, Al Johnson alj62888@yahoo.com writes:
In the very slim event that legit wiki editor would also happen to have had his IP previously used by a malicious botnet, wouldn't a "IP blocked" message simply inform him that his computer has been compromised?
No, because they could be using a dialup account and get a dynamic IP address that was used by someone else previously.
Everything that's being discussed has already been done to combat email spam. It seems the appropriate thing to do is to leverage that work instead of re-inventing it.
Regarding a centralized DB...
This would require a whole 'nother effort and staff.
Alternative: Sites simply publish their spammer IP list to a public page that is only editable by admins or bureaucrats and in a predefined format such as:
IP, date-discovered (or semantic format?)
Then, anyone can download those lists from the sites they choose (trust).
For performance reasons, it would be best if only the recent additions to the lists can be queried... which implies using semantic data.
A side benefit would be that the whole world, other than wiki admins, can also see the list and use it as they wish... sharing, the wiki way!
________________________________ From: Al Johnson alj62888@yahoo.com To: Al Johnson alj62888@yahoo.com; MediaWiki announcements and site admin list mediawiki-l@lists.wikimedia.org; MediaWiki announcements and site admin list mediawiki-l@lists.wikimedia.org Sent: Friday, May 24, 2013 3:19 PM Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
More brainstorming ideas about this...
A centralized DB would require a site requiring heavy use since it would be hit for each edit submission from every participating site. An alternative would be a subscriber model where the global list (or updates) is downloaded periodically by subscribers.
How long to block an IP? I'm not an expert in this area, but what if the IP is blocked for an extended period of time, such as ONE YEAR. If that IP ends up being reissued to a legitimate ISP customer and he gets blocked, that will alert that user that his ISP is either participating in spamming or is tolerating spammers user of their IPs which is affecting regular customers. This may sound harsh, but this could force ISPs to address the problem with their spammer customers. And, there is still a workaround for the legit user who can usually just reboot their modem to get a new IP address.
Just thinking out load... al
________________________________ From: Al Johnson alj62888@yahoo.com To: MediaWiki announcements and site admin list mediawiki-l@lists.wikimedia.org Sent: Friday, May 24, 2013 2:41 PM Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
Maybe mediawiki sites can unite to keep a global list of these IP's and block them as soon as they are submitted. Each mediawiki site can auto-submit a spammer IP as soon as it's discovered to the global list. What are the problems with this idea?
Al _______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
On Fri, 24 May 2013 13:41:04 -0700, Al Johnson alj62888@yahoo.com wrote:
Maybe mediawiki sites can unite to keep a global list of these IP's and block them as soon as they are submitted. Each mediawiki site can auto-submit a spammer IP as soon as it's discovered to the global list. What are the problems with this idea?
Al
IP blocking simply doesn't work. It's like playing whack-a-mole against a billion moles (or trillions on trillions once IPv6 really takes off). There are too many open proxies, botnet machines, etc... and many of them are either also addresses used by real editors, NAT addresses with editors on them, or dynamic IPs that will soon be forced on a non-spammer while the spammer gets an unblocked IP.
The proper way to deal with this spam is not by IP but by content. We need some people who are knowledgeable about matching spam by training programs with spam and non-spam. That's the kind of central database that would be useful. An extension that sends spam (and after awhile things marked non-spam) to a central database. A community on that database that vets valid and invalid submissions. And eventually a mode for that extension that will start using information generated from that data to start filtering out spam edits.
I've actually already thought about this and thought about how to make it friendly to users when their edits accidentally end up considered spam: https://www.mediawiki.org/wiki/User:Dantman/Anti-spam_system
Although IP blocking isn't perfect, it's probably the most practical. How much worse would spam be without the current blacklists? Although machine learning is always an alluring route to take, it is very very very hard to get right and is still easily tricked. 99% of the mail in my Yahoo! spam folder is not spam. But, if you have a unique way to better detect spam using machine learning that doesn't require constant review, then I would gladly pay you for that service.
________________________________ From: Daniel Friesen daniel@nadir-seen-fire.com To: mediawiki-l@lists.wikimedia.org Sent: Friday, May 24, 2013 7:09 PM Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
On Fri, 24 May 2013 13:41:04 -0700, Al Johnson alj62888@yahoo.com wrote:
Maybe mediawiki sites can unite to keep a global list of these IP's and block them as soon as they are submitted. Each mediawiki site can auto-submit a spammer IP as soon as it's discovered to the global list. What are the problems with this idea?
Al
IP blocking simply doesn't work. It's like playing whack-a-mole against a billion moles (or trillions on trillions once IPv6 really takes off). There are too many open proxies, botnet machines, etc... and many of them are either also addresses used by real editors, NAT addresses with editors on them, or dynamic IPs that will soon be forced on a non-spammer while the spammer gets an unblocked IP.
The proper way to deal with this spam is not by IP but by content. We need some people who are knowledgeable about matching spam by training programs with spam and non-spam. That's the kind of central database that would be useful. An extension that sends spam (and after awhile things marked non-spam) to a central database. A community on that database that vets valid and invalid submissions. And eventually a mode for that extension that will start using information generated from that data to start filtering out spam edits.
I've actually already thought about this and thought about how to make it friendly to users when their edits accidentally end up considered spam: https://www.mediawiki.org/wiki/User:Dantman/Anti-spam_system
On Fri, 24 May 2013 18:41:09 -0700, Al Johnson alj62888@yahoo.com wrote:
Although IP blocking isn't perfect, it's probably the most practical. How much worse would spam be without the current blacklists? Although machine learning is always an alluring route to take, it is very very very hard to get right and is still easily tricked. 99% of the mail in my Yahoo! spam folder is not spam. But, if you have a unique way to better detect spam using machine learning that doesn't require constant review, then I would gladly pay you for that service.
Yahoo is a pretty low bar to use. Plenty implementations get it better.
Email anti-spam has to deal with spam/not-spam from missions of untrusted users.
Wiki anti-spam can handle this a little easier. Spam/not-spam on wikis are only done by trusted users. So on the central system instead of having to vet every submission eventually you only need to vet wiki instead. Also since articles are public, not private like emails. So you can do proper vetting with a whole community.
On Fri, 24 May 2013, Daniel Friesen wrote:
.. The proper way to deal with this spam is not by IP but by content. We need some people who are knowledgeable about matching spam by training programs with spam and non-spam. ..
Well, Daniel. I have some ideas how to realize the automatic analysis of the content of articles and qualify some of them as spam. They are based on the TORI axioms, and I am not sure if this is correct place to describe them. Better, I would try to do it by myself, but yet I have no experience programming in PHP and writing robots. (My best achievements: I wrote few PHP scripts and once I killed a hundred of users through MySQL with a single command; and I am not sure if the intelligent robot should use such a brutal way.) In order to participate in the project, I need certain help from the professionals. Namely, I need somebody to post the detailed tutorial, description of the basic "plug-in" and "plug-out", with very simple examples: 1. Code opens the wiki, downloads the list of new files and saves the list as a text file in the working directory. 2. Code open the specific page for editing and saves its source in the working directory. 3. Code opens the editing of the specific page and replaces its content with the special source from the working directory. 4. Code opens the editing of the specific discussion page and add there the warning. 5. Code blocks the specific user. 6. Code removes the specific page. 7. Code collects all the complains about its activity and transfers them to the Human–administrator. 8. Code makes the google search and saves the results as the text file. The spammers already have these examples; it would be good to supply with the same tools the colleagues, who handle some wikis. The samples mentioned above should be short; preferably, one line each. They should be optimized not for the best performance, but for the easiest understanding by a human. In particular, neither loops, not complicated logical expressions should be involved. The rest I plan to write in C++, which seems to be faster than PHP; and (which is more important) I am more familiar with C++ than with PHP. The goal is robot–admin, robot-editor, that would not be distinguishable from an intelligent professional human, that follows the explicitly formulated and transparent editorial politics. If success, you'll be able to rewrite it from C++ to PHP and optimize for MediaWiki.
Also, it would be good to arrange the option, that the new page, by default, opens with sertain content form the sample page (for example, http://mizugadro.mydns.jp/o/index.php/SamplePage ) that helps the human to provide the necessary elements of a good article: preamble, introduction, definition(s), description of the new concept(s), support of the concept suggested, critics of the concept suggested, ways of refutation of the concept suggested, humor about the concept, conclusion, references, keywords, categories. Then, any article, that fail the elements above, should qualified as spam and treated correspondently.
On Fri, May 24, 2013 at 6:09 PM, Daniel Friesen daniel@nadir-seen-fire.com wrote:
On Fri, 24 May 2013 13:41:04 -0700, Al Johnson alj62888@yahoo.com wrote:
Maybe mediawiki sites can unite to keep a global list of these IP's and block them as soon as they are submitted. Each mediawiki site can auto-submit a spammer IP as soon as it's discovered to the global list. What are the problems with this idea?
Al
IP blocking simply doesn't work. It's like playing whack-a-mole against a billion moles (or trillions on trillions once IPv6 really takes off). There are too many open proxies, botnet machines, etc... and many of them are either also addresses used by real editors, NAT addresses with editors on them, or dynamic IPs that will soon be forced on a non-spammer while the spammer gets an unblocked IP.
From what I've seen, it's probably the least time-effective technique
for preventing spam, but it is effective against naive vandals like we see on en.wikipedia.org. It may be enough to get a spammer to move on to easier targets. I'd be interested in hearing if any smaller wikis have tried this and found it to work or not work.
The proper way to deal with this spam is not by IP but by content. We need some people who are knowledgeable about matching spam by training programs with spam and non-spam. That's the kind of central database that would be useful. An extension that sends spam (and after awhile things marked non-spam) to a central database. A community on that database that vets valid and invalid submissions. And eventually a mode for that extension that will start using information generated from that data to start filtering out spam edits.
I've actually already thought about this and thought about how to make it friendly to users when their edits accidentally end up considered spam: https://www.mediawiki.org/wiki/User:Dantman/Anti-spam_system
We have a gsoc proposal from Anubhav, who wants to create a bayesan filter: http://www.mediawiki.org/wiki/User:Anubhav_iitr/Bayesan_spam_filter
The primary target for that project is smaller wikis, so I'm sure he would appreciate input and feedback on that project if it gets accepted.
-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
On 24/05/13 22:41, Al Johnson wrote:
Maybe mediawiki sites can unite to keep a global list of these IP's and block them as soon as they are submitted. Each mediawiki site can auto-submit a spammer IP as soon as it's discovered to the global list. What are the problems with this idea?
There exist such lists for forum spam, for example http://www.stopforumspam.com/ . I suspect a lot of IPs would end up being used for wiki spam too, and of course you can add wiki spam to the list.
On Mon, 27 May 2013 08:44:31 +0200, Nikola Smolenski smolensk@eunet.rs wrote:
On 24/05/13 22:41, Al Johnson wrote:
Maybe mediawiki sites can unite to keep a global list of these IP's and block them as soon as they are submitted. Each mediawiki site can auto-submit a spammer IP as soon as it's discovered to the global list.
What are the problems with this idea?
There exist such lists for forum spam, for example http://www.stopforumspam.com/ . I suspect a lot of IPs would end up being used for wiki spam too, and of course you can add wiki spam to the
list.
It is a bloody good and useful list, I have introduced fellow stewards in its use, and gives a good indication of the period of infection/infiltration. Also one we will quote in our IP blocks reasons, either individually or in IP ranges. Would be fantastic in an adjunct to a bayesian system, from either a complete fail, or as a factor in increasing the blocking score, especially if used in conjunction with time period since last abused. [Abused and used today, score a 4, abused yesterday, score 3.0, last week 2, ... You just need to be aware that is raw data and you need to be aware of closed proxies like those operated by Singnet +++]
I will note that we are finding at WMF wikis, that we often can be a leader (first spammed) so it may be a day before the data appears there. It would be useful if we could feed data back.
Regards, Billinghurst
On 27/05/13 09:22, billinghurst wrote:
I will note that we are finding at WMF wikis, that we often can be a leader (first spammed) so it may be a day before the data appears there. It
This is an antispam solution I have considered - if an IP is blocked from editing Wikipedia, it is not allowed editing elsewhere :)
On 27/05/13 09:22, billinghurst wrote:
I will note that we are finding at WMF wikis, that we often can be a leader (first spammed) so it may be a day before the data appears there. It would be useful if we could feed data back.
It is possible to report new IPs at http://www.stopforumspam.com/add but AFAICT only manually.
On Mon, 27 May 2013 09:43:01 +0200, Nikola Smolenski smolensk@eunet.rs wrote:
On 27/05/13 09:22, billinghurst wrote:
I will note that we are finding at WMF wikis, that we often can be a leader (first spammed) so it may be a day before the data appears
there.
It would be useful if we could feed data back.
It is possible to report new IPs at http://www.stopforumspam.com/add but
AFAICT only manually.
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Sure, but adding so many IPs, and yet another spot to plug in more data in more steps when we have managed it locally while helpful is not desirable. Reporting function built in the system would be sooooo much better, tick to submit! though that would need to be coordinated with the service provider.
Bayesian is nice. IP is lousy, for reasons repeatedly mentioned here. Blended solutions, as are well used in email filtering by major vendors is great. Filtered phrase, filtered IP spoof, captured. The funny thing is, all major vendors use community based patterns, involuntary or, rarely, voluntary. It's a default that major corporations do contractual things to protect IP and PII, but still contribute.
So, there is value in a shared pool of knowledge. But, someone or some group must end up tending it, lest one lose massive amounts of user base due to a botnet attack. For, the botnet is the the current enemy. Not Joe Blow with his scripts. Or the "king of spam".
On May 27, 2013, at 4:12 AM, billinghurst wrote:
On Mon, 27 May 2013 09:43:01 +0200, Nikola Smolenski smolensk@eunet.rs wrote:
On 27/05/13 09:22, billinghurst wrote:
I will note that we are finding at WMF wikis, that we often can be a leader (first spammed) so it may be a day before the data appears
there.
It would be useful if we could feed data back.
It is possible to report new IPs at http://www.stopforumspam.com/add but
AFAICT only manually.
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Sure, but adding so many IPs, and yet another spot to plug in more data in more steps when we have managed it locally while helpful is not desirable. Reporting function built in the system would be sooooo much better, tick to submit! though that would need to be coordinated with the service provider.
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
----- Original Message -----
From: Nikola Smolenski smolensk@eunet.rs To: MediaWiki announcements and site admin list mediawiki-l@lists.wikimedia.org Cc: Sent: Monday, May 27, 2013 1:43 AM Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
On 27/05/13 09:22, billinghurst wrote:
I will note that we are finding at WMF wikis, that we often can be a leader (first spammed) so it may be a day before the data appears there. It would be useful if we could feed data back.
It is possible to report new IPs at http://www.stopforumspam.com/add but AFAICT only manually.
Can automate submissions: http://www.stopforumspam.com/usage
Interesting, you can download the entire db (~5MB) and download incremental updates 24 times/day. This is small enough to store in memory for fast client-size lookups. They also accept RESTful submissions. I wonder what other options there are like this.
----- Original Message ----- From: Nikola Smolenski smolensk@eunet.rs To: Al Johnson alj62888@yahoo.com; MediaWiki announcements and site admin list mediawiki-l@lists.wikimedia.org Cc: Sent: Monday, May 27, 2013 12:44 AM Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
On 24/05/13 22:41, Al Johnson wrote:
Maybe mediawiki sites can unite to keep a global list of these IP's and block them as soon as they are submitted. Each mediawiki site can auto-submit a spammer IP as soon as it's discovered to the global list. What are the problems with this idea?
There exist such lists for forum spam, for example http://www.stopforumspam.com/ . I suspect a lot of IPs would end up being used for wiki spam too, and of course you can add wiki spam to the list.
Al, the problem with stop forum spam is not memory but rather the CPU usage and lag it creates, for an apparently limited amount of spam blocked (at least on small wikis) https://www.mediawiki.org/wiki/Manual_talk:Combating_spam#CPU_usage.3B_IP_blacklists
Nemo
From: Federico Leva (Nemo) nemowiki@gmail.com To: mediawiki-l@lists.wikimedia.org Sent: Tuesday, May 28, 2013 2:47 AM Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
Al, the problem with stop forum spam is not memory but rather the CPU usage and lag it creates, for an apparently limited amount of spam blocked (at least on small wikis) https://www.mediawiki.org/wiki/Manual_talk:Combating_spam#CPU_usage.3B_IP_blacklists
Hi Nemo. You didn't say why you would not recommend that approach. Can you elaborate?
Nemo
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
I can — Nemo was referencing some dialog we had while I was trying to figure out performance issues on my farm.
You can see the whole dialog, with graphs, here:
http://wikiapiary.com/wiki/WikiApiary_talk:Operations/2013/May#Banned_IP_che...
Actually, after rereading that Talk page I'll just leave it at the link. All the gruesome details are there. :-)
TL:DR; Using this stopforumspam method caused me 4-5x slowdown in performance. Jamie Thingelstad jamie@thingelstad.com mobile: 612-810-3699 find me on AIM Twitter Facebook LinkedIn
On May 28, 2013, at 10:48 AM, Al Johnson alj62888@yahoo.com wrote:
From: Federico Leva (Nemo) nemowiki@gmail.com To: mediawiki-l@lists.wikimedia.org Sent: Tuesday, May 28, 2013 2:47 AM Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
Al, the problem with stop forum spam is not memory but rather the CPU usage and lag it creates, for an apparently limited amount of spam blocked (at least on small wikis) https://www.mediawiki.org/wiki/Manual_talk:Combating_spam#CPU_usage.3B_IP_blacklists
Hi Nemo. You didn't say why you would not recommend that approach. Can you elaborate?
Nemo
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
On Tue, May 28, 2013 at 9:14 AM, Jamie Thingelstad jamie@thingelstad.com wrote:
I can — Nemo was referencing some dialog we had while I was trying to figure out performance issues on my farm.
You can see the whole dialog, with graphs, here:
http://wikiapiary.com/wiki/WikiApiary_talk:Operations/2013/May#Banned_IP_che...
Thanks for posting that Jamie, that does point out some serious issues with that database. Although with less than 400k rows, it seems like that data may just need some index and cach tuning. To me, using a database like this feels like the right approach for smaller wikis.
Were you using something like Extension:Check_Spambots to integrate the db with the wiki?
Actually, after rereading that Talk page I'll just leave it at the link. All the gruesome details are there. :-)
TL:DR; Using this stopforumspam method caused me 4-5x slowdown in performance. Jamie Thingelstad jamie@thingelstad.com mobile: 612-810-3699 find me on AIM Twitter Facebook LinkedIn
On May 28, 2013, at 10:48 AM, Al Johnson alj62888@yahoo.com wrote:
From: Federico Leva (Nemo) nemowiki@gmail.com To: mediawiki-l@lists.wikimedia.org Sent: Tuesday, May 28, 2013 2:47 AM Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
Al, the problem with stop forum spam is not memory but rather the CPU usage and lag it creates, for an apparently limited amount of spam blocked (at least on small wikis) https://www.mediawiki.org/wiki/Manual_talk:Combating_spam#CPU_usage.3B_IP_blacklists
Hi Nemo. You didn't say why you would not recommend that approach. Can you elaborate?
Nemo
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
From: Jamie Thingelstad jamie@thingelstad.com To: Al Johnson alj62888@yahoo.com; MediaWiki announcements and site admin list mediawiki-l@lists.wikimedia.org Sent: Tuesday, May 28, 2013 10:14 AM Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
I can — Nemo was referencing some dialog we had while I was trying to figure out performance issues on my farm.
You can see the whole dialog, with graphs, here:
http://wikiapiary.com/wiki/WikiApiary_talk:Operations/2013/May#Banned_IP_che...
Actually, after rereading that Talk page I'll just leave it at the link. All the gruesome details are there. :-)
TL:DR; Using this stopforumspam method caused me 4-5x slowdown in performance.
I see. I'm curious how the lookup was implemented? Sounds like an in-memory search. Have you thought about using a bloom filter[1]? That would reduce your lookup time to sub-millisecond. I recently implemented one in both javascript and java and am quite happy with it. It holds up to 1 million entries in 2.7 MB with a false positive rate of under 0.1%.
[1] http://en.wikipedia.org/wiki/Bloom_filter
al
Jamie Thingelstad jamie@thingelstad.com mobile: 612-810-3699 find me on AIM Twitter Facebook LinkedIn
On May 28, 2013, at 10:48 AM, Al Johnson alj62888@yahoo.com wrote:
From: Federico Leva (Nemo) nemowiki@gmail.com To: mediawiki-l@lists.wikimedia.org Sent: Tuesday, May 28, 2013 2:47 AM Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
Al, the problem with stop forum spam is not memory but rather the CPU usage and lag it creates, for an apparently limited amount of spam blocked (at least on small wikis) https://www.mediawiki.org/wiki/Manual_talk:Combating_spam#CPU_usage.3B_IP_blacklists
Hi Nemo. You didn't say why you would not recommend that approach. Can you elaborate?
Nemo
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Our wiki has been spam-free for over a year now, despite anonymous editing being allowed, while keeping the inconvenience for legitimate editors to a bare minimum. Here is what we're doing:
First, we use ConfirmEdit with QuestyCaptcha, but only for "createaccount" and "badlogin". We ask community-specific questions, and the answer is not a word in the question; seems to work perfectly, we didnt have a bot-account for over a year now (also, we dont allow numbers in account-names, which probably helps too). If we would require editors to be logged-in, we probably wouldnt need any other spam-protection. Since we do want to allow anonymous editing, we also use the Abuse-Filter-Extension with Filters that only trigger for users that are not autoconfirmed (autoconfirmed-status is given in our wiki once the first edit was successful). One of the filters (probably the most important) requires users to do something specific if they want to create a new page (most spammers try to create new pages); if they dont, their edit is blocked. This specific action could be anything, from adding a certain word to using specific wiki-markup or using a certain edit-summary. Ideally, every wiki using this concept should require a different action, just like QuestyCaptcha-Questions will differ for every wiki. That way, spammers cant adjust their bots to it. And what about spammers that edit existing pages? Luckily, they almost always add links, and they almost always delete existing text, so one can construct filters to stop that.
In summary: Interested editors have to answer a simple question if they want to create a new account. After that, they can do whatever they want, without being watched by the filter. Anonymous editors have to do something specific if they want to create a new page or add new links - or they could just create a new account. And spammers are blocked completely.
So, in my opinion, the following should be done to help new wikis combat spam:
1. Bundle ConfirmEdit and AbuseFilter with each MediaWiki-Download. 2. Make QuestyCaptcha the default for ConfirmEdit. It is easy to use and more effective than recaptcha, cause it cant be solved by paid workers from the other side of the planet while allowing interested, legitimate editors to pass (if your questions are good enough). 3. AbuseFilter should have a pre-installed filter (or at least a link to a help page with examples). 4. The section about AbuseFilter on Manual:Combating_spam should contain examples or a link to a page with examples.
examples:
!("autoconfirmed" in user_groups) & (action == "edit") & (article_articleid == 0) & !("your phrase or wiki markup" in new_wikitext)
alternativly:
!("autoconfirmed" in user_groups) & (action == "edit") & (article_articleid == 0) & !("your phrase" in summary)
another alternative:
!("autoconfirmed" in user_groups) & (action == "edit") & ((article_articleid == 0)|(length(added_links) >= 1)) & !("your phrase" in summary)
Obviously, the message telling the editor that his edit was blocked should mention what he has to do if he wants his edit to be saved, so the notes of the pre-installed filter should contain a link to MediaWiki:abusefilter-disallowed.
The really important thing here is that spammers cant adjust to these measures if every wiki asks different question and demands different actions for edits from anonymous users.
Also, we have been using the already mentioned bot-trap from danielwebb.us/software/bot-trap for some time now. Since legitimate users can unblock themselves, it really doesnt hurt to use it.
Greetings Stip
On 05/29/2013 11:13 AM, Stip wrote:
- Bundle ConfirmEdit and AbuseFilter with each MediaWiki-Download.
ConfirmEdit is bundled. Please add AbuseFilter to https://www.mediawiki.org/wiki/Bundled_extensions for discussion.
- Make QuestyCaptcha the default for ConfirmEdit. It is easy to use and
more effective than recaptcha, cause it cant be solved by paid workers from the other side of the planet while allowing interested, legitimate editors to pass (if your questions are good enough).
This is a good idea. There is some work on providing a SpecialPage interface for QuestyCaptcha that I think would help this a lot.
- AbuseFilter should have a pre-installed filter (or at least a link to
a help page with examples).
I agree. I've wanted to provide a way to bundle help pages, or at least download them into the wiki.
- The section about AbuseFilter on Manual:Combating_spam should contain
examples or a link to a page with examples.
Please add examples you find useful to it.
Thanks for all your insight!
Mark.
On Wed, May 29, 2013 at 12:04 PM, Mark A. Hershberger mah@everybody.org wrote:
I agree. I've wanted to provide a way to bundle help pages, or at least download them into the wiki.
This was always the ultimate goal of making NS_HELP on mediawiki.org public domain.
-Chad
On Wed, 29 May 2013 12:04:38 -0400, "Mark A. Hershberger" mah@everybody.org wrote: [snip]
- AbuseFilter should have a pre-installed filter (or at least a link
to
a help page with examples).
I agree. I've wanted to provide a way to bundle help pages, or at least download them into the wiki.
As a pseudo measure, is it possible to link to http://www.mediawiki.org/wiki/Extension:AbuseFilter in the current Special:AbuseFilter link-ribbon at the top? Then allow for a local customisation if/when a help file is bundled
Regards, Billinghurst
Very helpful. Thanks.
From: Stip stipen.treublatt@gmx.net To: MediaWiki announcements and site admin list mediawiki-l@lists.wikimedia.org Sent: Wednesday, May 29, 2013 9:13 AM Subject: Re: [MediaWiki-l] Wiki spam. Stronger fightback.
Our wiki has been spam-free for over a year now, despite anonymous editing being allowed, while keeping the inconvenience for legitimate editors to a bare minimum. Here is what we're doing:
First, we use ConfirmEdit with QuestyCaptcha, but only for "createaccount" and "badlogin". We ask community-specific questions, and the answer is not a word in the question; seems to work perfectly, we didnt have a bot-account for over a year now (also, we dont allow numbers in account-names, which probably helps too). If we would require editors to be logged-in, we probably wouldnt need any other spam-protection. Since we do want to allow anonymous editing, we also use the Abuse-Filter-Extension with Filters that only trigger for users that are not autoconfirmed (autoconfirmed-status is given in our wiki once the first edit was successful). One of the filters (probably the most important) requires users to do something specific if they want to create a new page (most spammers try to create new pages); if they dont, their edit is blocked. This specific action could be anything, from adding a certain word to using specific wiki-markup or using a certain edit-summary. Ideally, every wiki using this concept should require a different action, just like QuestyCaptcha-Questions will differ for every wiki. That way, spammers cant adjust their bots to it. And what about spammers that edit existing pages? Luckily, they almost always add links, and they almost always delete existing text, so one can construct filters to stop that.
In summary: Interested editors have to answer a simple question if they want to create a new account. After that, they can do whatever they want, without being watched by the filter. Anonymous editors have to do something specific if they want to create a new page or add new links - or they could just create a new account. And spammers are blocked completely.
So, in my opinion, the following should be done to help new wikis combat spam:
- Bundle ConfirmEdit and AbuseFilter with each MediaWiki-Download.
- Make QuestyCaptcha the default for ConfirmEdit. It is easy to use and
more effective than recaptcha, cause it cant be solved by paid workers from the other side of the planet while allowing interested, legitimate editors to pass (if your questions are good enough). 3. AbuseFilter should have a pre-installed filter (or at least a link to a help page with examples). 4. The section about AbuseFilter on Manual:Combating_spam should contain examples or a link to a page with examples.
examples:
!("autoconfirmed" in user_groups) & (action == "edit") & (article_articleid == 0) & !("your phrase or wiki markup" in new_wikitext)
alternativly:
!("autoconfirmed" in user_groups) & (action == "edit") & (article_articleid == 0) & !("your phrase" in summary)
another alternative:
!("autoconfirmed" in user_groups) & (action == "edit") & ((article_articleid == 0)|(length(added_links) >= 1)) & !("your phrase" in summary)
Obviously, the message telling the editor that his edit was blocked should mention what he has to do if he wants his edit to be saved, so the notes of the pre-installed filter should contain a link to MediaWiki:abusefilter-disallowed.
The really important thing here is that spammers cant adjust to these measures if every wiki asks different question and demands different actions for edits from anonymous users.
Also, we have been using the already mentioned bot-trap from danielwebb.us/software/bot-trap for some time now. Since legitimate users can unblock themselves, it really doesnt hurt to use it.
Greetings Stip
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Also, we have been using the already mentioned bot-trap from danielwebb.us/software/bot-trap for some time now. Since legitimate users can unblock themselves, it really doesnt hurt to use it.
Forgot to mention: I had to change functions.php to make this trap work.
$new_fp_name = tempnam("sys_get_temp_dir()","bbo"); $new_fp = fopen($new_fp_name, "w"); if (!is_file($new_fp_name)) { echo "<p>File $new_fp_name doesn't exist, can you create this file?</p>"; return; }
Greetings Stip
mediawiki-l@lists.wikimedia.org