Some spam bots specifically target wikis (edit/create page and then enter data). I don't have problems with spammers actually getting through and submitting spam. The anti-spam is strong enough to stop that from happening. The problem is that they still use CPU and ram. I know about Fail2ban but I haven't tried it yet. Here's one idea: If a certain IP address fails the captchas a specified number of times in 5 minutes or so, it should be banned temporarily for say, 24 hours (through htaccess or firewall etc). This would keep that IP address from using up CPU for some time and reduce overall CPU. Anyone who has already implemented something like this?
On Thu, Mar 28, 2013 at 11:32 PM, Dan Fisher danfisher261@gmail.com wrote:
Here's one idea: If a certain IP address fails the captchas a specified number of times in 5 minutes or so, it should be banned temporarily for say, 24 hours (through htaccess or firewall etc).
Humans regularly get CAPTCHAs wrong, and they often do so multiple times (if you have any elderly relatives, feel free to see how many tries it takes them to solve a reCAPTCHA one). Blocking them from even viewing your site for a day seems a little extreme.
Is there an actual problem you're trying to solve here? Is there any indication that spam bots are affecting your site's performance? If not, worrying about this is probably a waste of your time.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 29/03/13 18:29, Benjamin Lees wrote:
Humans regularly get CAPTCHAs wrong, and they often do so multiple times (if you have any elderly relatives, feel free to see how many tries it takes them to solve a reCAPTCHA one). Blocking them from even viewing your site for a day seems a little extreme.
+1 - there is one well-known blog site that uses capture, and I've tried as many as 10 times on a single submission, only to give up because I simply couldn't get the captcha right. Now I don't even try to comment there.
Is there an actual problem you're trying to solve here? Is there any indication that spam bots are affecting your site's performance? If not, worrying about this is probably a waste of your time.
We did have something of a problem on the userbase.kde.org wiki - mostly spam inserted on new user pages, but finally one destructive spam edit. We installed questy captcha for registration. It stops machine/bot registrations but not human ones. It doesn't stop humans adding spam, but believe me, the problem is so light compared with what it was.
Anne
Benjamin Lees emufarmers@gmail.com wrote:
Is there an actual problem you're trying to solve here? Is there any
indication that spam bots are affecting your site's performance? If not, worrying about this is probably a waste of your time.
Spambots and CPU is a known issue: https://www.google.com/search?q=spam+bots+cpu&ie=utf-8&oe=utf-8&...
Is it a problem? Yes, they're constantly trying to break in and that increases CPU usage. I dont have any analysis to prove it but I've seen many times where traffic has been normal (google analytics) but the CPU usage has gone very high. I came from a shared server where I was actually asked to leave because of the CPU usage. I've had big problems with CPU. I'm on VPS now and it can still be a problem. Average CPU usage recently went up from around 20% to 160% (multi-core, thats why it goes over 100 or some other reason) for a few hours, while Google analytics showed no change. Whether this is a malicious/ddos bot or an advertising bot, this is something that needs to be studied and dealt with. If I stay on 200% CPU usage on the VPS I may be asked to leave the server. So yes I have to keep a watch over CPU and I have to explore all possible options to keep the usage down. I'm using caching and nginx (earlier suggestions by people on this list).
As to how to prevent genuine viewers from being blocked, thats problem #2 and its something that can be improved.
I'll try this suggested by Henny: http://danielwebb.us/software/bot-trap/
Anne wrote:
+1 - there is one well-known blog site that uses capture, and I've tried
as many as 10 times on a single submission, only to give up because I simply couldn't get the captcha right. Now I don't even try to comment there.
They need a better captcha. But yes, you guys have reminded me that whatever method is used, I need to make sure genuine visitors are not effected. The link by Henny might take care of it as the 'hidden' link is not seen by humans.
thanks Dan
On Fri, Mar 29, 2013 at 1:29 PM, Benjamin Lees emufarmers@gmail.com wrote:
On Thu, Mar 28, 2013 at 11:32 PM, Dan Fisher danfisher261@gmail.com wrote:
Here's one idea: If a certain IP address fails the captchas a specified number of times in 5 minutes or so, it should be banned temporarily for say, 24 hours (through htaccess or firewall etc).
Humans regularly get CAPTCHAs wrong, and they often do so multiple times (if you have any elderly relatives, feel free to see how many tries it takes them to solve a reCAPTCHA one). Blocking them from even viewing your site for a day seems a little extreme.
Is there an actual problem you're trying to solve here? Is there any indication that spam bots are affecting your site's performance? If not, worrying about this is probably a waste of your time. _______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
On 03/30/2013 12:47 PM, Dan Fisher wrote:
Is it a problem? Yes, they're constantly trying to break in and that increases CPU usage. I dont have any analysis to prove it
I suggest that you rely on some "proof".
What if you blocked the spammers completely -- found some magic rage of IP addresses and banned them completely -- and your CPU usage was still skyrocketing? Well, at least you would have fewer spammers but you already said you weren't actually worried about spam, just CPU usage.
Make sure you're focused on the right solution to your CPU problem.
On Sat, Mar 30, 2013 at 11:16 AM, Mark A. Hershberger mah@everybody.org wrote:
On 03/30/2013 12:47 PM, Dan Fisher wrote:
Is it a problem? Yes, they're constantly trying to break in and that increases CPU usage. I dont have any analysis to prove it
I suggest that you rely on some "proof".
What if you blocked the spammers completely -- found some magic rage of IP addresses and banned them completely -- and your CPU usage was still skyrocketing? Well, at least you would have fewer spammers but you already said you weren't actually worried about spam, just CPU usage.
Make sure you're focused on the right solution to your CPU problem.
I would agree with that, and add that spam fighting tools (AbuseFilter, Spam / Title blacklist, etc) are some of the highest consumers of cpu for each wikipedia edit. So too much spam protection and you'll be right back where you're at with cpu. So you definitely want to do some profiling, and make sure you're addressing the biggest issues first.
On 30/03/13 21:57, Chris Steipp wrote:
I would agree with that, and add that spam fighting tools (AbuseFilter, Spam / Title blacklist, etc) are some of the highest consumers of cpu for each wikipedia edit. So too much spam protection and you'll be right back where you're at with cpu. So you definitely want to do some profiling, and make sure you're addressing the biggest issues first.
Very good point. Some of your anti-spam measures may be taking a good chunk of CPU.
Thanks guys.
So you definitely want to do some profiling, and make sure you're
addressing the biggest issues first.
I havent done profiling before. I'm assuming this is the link to check out: http://www.mediawiki.org/wiki/Manual:How_to_debug#Profiling
I will attempt to do this next time I have CPU problems.
Right now we are running really well on Linode (except for that one time a few weeks ago where it went up to 160% usage). I have to again thank the people who helped me with that advice before. I'm so happy with the speed. Linode recently increased their 512MB ram package to 1GB (doubled for all packages).
thanks Dan
On Sun, Mar 31, 2013 at 4:22 PM, Platonides Platonides@gmail.com wrote:
On 30/03/13 21:57, Chris Steipp wrote:
I would agree with that, and add that spam fighting tools (AbuseFilter, Spam / Title blacklist, etc) are some of the highest consumers of cpu for each wikipedia edit. So too much spam protection and you'll be right back where you're at with cpu. So you definitely want to do some profiling, and make sure you're addressing the biggest issues first.
Very good point. Some of your anti-spam measures may be taking a good chunk of CPU.
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
On Sat, 30 Mar 2013 09:47:45 -0700, Dan Fisher danfisher261@gmail.com wrote:
If I stay on 200% CPU usage on the VPS I may be asked to leave the server. So yes I have to keep a watch over CPU and I have to explore all possible options to keep the usage down. I'm using caching and nginx (earlier suggestions by people on this list).
VPS hosts are supposed to be written so that no individual VPS can stop the other VPS instances on the same box from being able to use all the resources available to them. (ie: Some are coded so when the others are idle you can borrow some of their CPU. But that's an extra. If that server wanted those resources the host wouldn't let your instance use them).
If your host would kick you off for using all the resources allocated to you. Then you've got a cruddy VPS provider that doesn't know what they're doing and you should get a new one.
mediawiki-l@lists.wikimedia.org