Il 10/11/2014 17:23, Chris Steipp ha scritto:
On the general topic, I think either a captcha or verifying an email makes a small barrier to building a bot, but it's significant enough that it keeps the amateur bots out. I'd be very interested in seeing an experiment run to see what the exact impact is though.
Google had a great blog post on this subject where they made recaptcha easier to solve, and instead,
"The updated system uses advanced risk analysis techniques, actively considering the user's entire engagement with the CAPTCHA--before, during and after they interact with it. That means that today the distorted letters serve less as a test of humanity and more as a medium of engagement to elicit a broad range of cues that characterize humans and bots. " [1]
So spending time on a new engine that allows for environmental feedback from the system solving the captcha, and that lets us tune lots of things besides did the "user" sending back the right string of letters, I think would be well worth our time.
[1] - http://googleonlinesecurity.blogspot.com/2013/10/recaptcha-just-got-easier-b...
Il 04/12/2014 05:35, Robert Rohde ha scritto:
We have many smart people, and undoubtedly we could design a better captcha.
However, no matter how smart the mousetrap, as long as you leave it strewn around the doors and hallways, well-meaning people are going to trip over it.
I would support removing the captcha from generic entry points, like the account registration page, where we know many harmless people are encountering it.
However, captchas might be useful if used in conjunction with simple behavioral analysis, such as rate limiters. For example, if an IP is creating a lot of accounts or editing at a high rate of speed, those are bad signs. Adding the same external link to multiple pages is often a very bad sign. However, adding a link to the NYTimes or CNN or an academic journal is probably fine. With that in mind, I would also eliminate the external link captcha in most cases where a link has only been added once and try to be more intelligent about which sites trigger it otherwise.
Basically, I'd advocate a strategy of adding a few heuristics to try and figure out who the mice are before putting the mousetraps in front of them. Of course, the biggest rats will still break the captcha and get through, but that is already true. Though reducing the prevalence of the captcha may increase the volume of spam by some small measure, I think it is more important that we stop erecting so many hurdles to new editors.
-Robert Rohde
Il 05/12/2014 06:28, Robert Rohde ha scritto:
I suspect that a lot of the spam are the obvious things such as external links to junk sites and repetitive promotional postings, though perhaps there are also less obvious types of spam?
I suspect we could weed out a lot of spammy link behavior by designing an external link classifier that used knowledge of what external links are frequently included and what external links are frequently removed to generate automatic good / suspect / bad ratings for new external links (or domains). Good links (e.g. NYTimes, CNN) might be automatically allowed for all users, suspect links (e.g. unknown or rarely used domains) might be automatically allowed for established users and challenged with captchas or other tools for new users / IPs, and bad links (i.e. those repeatedly spammed and removed) could be automatically detected and blocked.
-Robert Rohde
What about applying ClueBot NG's Vandalism Detection Algorithm https://en.wikipedia.org/wiki/User:ClueBot_NG#Vandalism_Detection_Algorithm to spam? At this point I think machine learning is the only way a real CAPTCHA can keep up with evil bots, and a text-based system (such as T34695 https://phabricator.wikimedia.org/T34695) would only be used for tuning, just as reCAPTCHA does.