Re: [Wikitech-l] Our CAPTCHA is very unfriendly

18 Mar 2015

Il 10/11/2014 17:23, Chris Steipp ha scritto:
...
  On the general topic, I think either a captcha or
verifying an email makes
 a small barrier to building a bot, but it's significant enough that it
 keeps the amateur bots out. I'd be very interested in seeing an experiment
 run to see what the exact impact is though.

 Google had a great blog post on this subject where they made recaptcha
 easier to solve, and instead,

 "The updated system uses advanced risk analysis techniques, actively
 considering the user's entire engagement with the CAPTCHA--before, during
 and after they interact with it. That means that today the distorted
 letters serve less as a test of humanity and more as a medium of engagement
 to elicit a broad range of cues that characterize humans and bots. " [1]

 So spending time on a new engine that allows for environmental feedback
 from the system solving the captcha, and that lets us tune lots of things
 besides did the "user" sending back the right string of letters, I think
 would be well worth our time.

 [1] -

http://googleonlinesecurity.blogspot.com/2013/10/recaptcha-just-got-easier-…

Il 04/12/2014 05:35, Robert Rohde ha scritto:
...
  We have many smart people, and undoubtedly we could
design a better captcha.

 However, no matter how smart the mousetrap, as long as you leave it strewn
 around the doors and hallways, well-meaning people are going to trip over
 it.

 I would support removing the captcha from generic entry points, like the
 account registration page, where we know many harmless people are
 encountering it.

 However, captchas might be useful if used in conjunction with simple
 behavioral analysis, such as rate limiters.  For example, if an IP is
 creating a lot of accounts or editing at a high rate of speed, those are
 bad signs.  Adding the same external link to multiple pages is often a very
 bad sign.  However, adding a link to the NYTimes or CNN or an academic
 journal is probably fine.  With that in mind, I would also eliminate the
 external link captcha in most cases where a link has only been added once
 and try to be more intelligent about which sites trigger it otherwise.

 Basically, I'd advocate a strategy of adding a few heuristics to try and
 figure out who the mice are before putting the mousetraps in front of
 them.  Of course, the biggest rats will still break the captcha and get
 through, but that is already true.  Though reducing the prevalence of the
 captcha may increase the volume of spam by some small measure, I think it
 is more important that we stop erecting so many hurdles to new editors.

 -Robert Rohde 
Il 05/12/2014 06:28, Robert Rohde ha scritto:
...
  I suspect that a lot of the spam are the obvious
things such as external
 links to junk sites and repetitive promotional postings, though perhaps
 there are also less obvious types of spam?

 I suspect we could weed out a lot of spammy link behavior by designing an
 external link classifier that used knowledge of what external links are
 frequently included and what external links are frequently removed to
 generate automatic good / suspect / bad ratings for new external links (or
 domains).  Good links (e.g. NYTimes, CNN) might be automatically allowed
 for all users, suspect links (e.g. unknown or rarely used domains) might be
 automatically allowed for established users and challenged with captchas or
 other tools for new users / IPs, and bad links (i.e. those repeatedly
 spammed and removed) could be automatically detected and blocked.

 -Robert Rohde 
What about applying ClueBot NG's Vandalism Detection Algorithm 
<https://en.wikipedia.org/wiki/User:ClueBot_NG#Vandalism_Detection_Algorithm> 
to spam?
At this point I think machine learning is the only way a real CAPTCHA 
can keep up with evil bots, and a text-based system (such as T34695 
<https://phabricator.wikimedia.org/T34695>) would only be used for 
tuning, just as reCAPTCHA does.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Our CAPTCHA is very unfriendly