Our CAPTCHA is very unfriendly

List overview All Threads
Download

newer

older

RFC meeting this week

Improving static analysis tools...

Jon Harald Søby

7 Nov 2014 7 Nov '14

8:52 a.m.

Hi all,

My apologies if this is the wrong place to start a discussion on this, but it's a better place than nowhere. I recently took part in two very different Wikipedia workshops -- one in Uganda for schoolchildren aged 14-17, and one Bodø, Norway, for GLAM people aged 35-55. One glaringly obvious barrier of entry that was common for both groups is that the CAPTCHA we use is too freaking hard.

The main concern is obviously that it is really hard to read, but there are also some other issues, namely that all the fields in the user registration form (except for the username) are wiped if you enter the CAPTCHA incorrectly. So when you make a mistake, not only do you have to re-type a whole new CAPTCHA (where you may make another mistake), you also have to re-type the password twice *and* your e-mail address. This takes a long time, especially if you're not a fast typer (which was the case for the first group), or if you are on a tablet or phone (which was the case for some in the second group).

So I would like to start a discussion about changing to a CAPTCHA that is more user-friendly, and hopefully one that isn't as English/Latin-alphabet-centric as the one we currently use. If Ugandan children and old Norwegian people, which all use the Latin alphabet, have such problems deciphering the CAPTCHA, what about people speaking languages that don't use the Latin alphabet? I would prefer something more simplistic, like some sort of math or image-based CAPTCHA, instead of the current CAPTCHA we use.

-- mvh Jon Harald Søby http://meta.wikimedia.org/wiki/User:Jon_Harald_S%C3%B8by

Show replies by date

Ryan Kaldari

7 Nov 7 Nov

9:13 a.m.

Wasn't a new CAPTCHA engine merged a couple weeks ago? What's the status of that? If I remember, the goal of the new engine was to make the CAPTCHA more difficult for bots, so it may (or may not) make things worse for humans. Have we ever done any research to find out how likely humans are to be able to solve our captchas?

Ryan Kaldari

On Nov 6, 2014, at 5:52 PM, Jon Harald Søby jhsoby@gmail.com wrote:

...

Hi all,

My apologies if this is the wrong place to start a discussion on this, but it's a better place than nowhere. I recently took part in two very different Wikipedia workshops -- one in Uganda for schoolchildren aged 14-17, and one Bodø, Norway, for GLAM people aged 35-55. One glaringly obvious barrier of entry that was common for both groups is that the CAPTCHA we use is too freaking hard.

The main concern is obviously that it is really hard to read, but there are also some other issues, namely that all the fields in the user registration form (except for the username) are wiped if you enter the CAPTCHA incorrectly. So when you make a mistake, not only do you have to re-type a whole new CAPTCHA (where you may make another mistake), you also have to re-type the password twice *and* your e-mail address. This takes a long time, especially if you're not a fast typer (which was the case for the first group), or if you are on a tablet or phone (which was the case for some in the second group).

So I would like to start a discussion about changing to a CAPTCHA that is more user-friendly, and hopefully one that isn't as English/Latin-alphabet-centric as the one we currently use. If Ugandan children and old Norwegian people, which all use the Latin alphabet, have such problems deciphering the CAPTCHA, what about people speaking languages that don't use the Latin alphabet? I would prefer something more simplistic, like some sort of math or image-based CAPTCHA, instead of the current CAPTCHA we use.

-- mvh Jon Harald Søby http://meta.wikimedia.org/wiki/User:Jon_Harald_S%C3%B8by _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Tim Starling

12:06 p.m.

I think it would be best if we just removed the captcha, rather than deploying a new engine.

-- Tim Starling

On 07/11/14 13:13, Ryan Kaldari wrote:

...

Wasn't a new CAPTCHA engine merged a couple weeks ago? What's the status of that? If I remember, the goal of the new engine was to make the CAPTCHA more difficult for bots, so it may (or may not) make things worse for humans. Have we ever done any research to find out how likely humans are to be able to solve our captchas?

Ryan Kaldari

On Nov 6, 2014, at 5:52 PM, Jon Harald Søby jhsoby@gmail.com wrote:

...
Hi all,

My apologies if this is the wrong place to start a discussion on this, but it's a better place than nowhere. I recently took part in two very different Wikipedia workshops -- one in Uganda for schoolchildren aged 14-17, and one Bodø, Norway, for GLAM people aged 35-55. One glaringly obvious barrier of entry that was common for both groups is that the CAPTCHA we use is too freaking hard.

The main concern is obviously that it is really hard to read, but there are also some other issues, namely that all the fields in the user registration form (except for the username) are wiped if you enter the CAPTCHA incorrectly. So when you make a mistake, not only do you have to re-type a whole new CAPTCHA (where you may make another mistake), you also have to re-type the password twice *and* your e-mail address. This takes a long time, especially if you're not a fast typer (which was the case for the first group), or if you are on a tablet or phone (which was the case for some in the second group).

So I would like to start a discussion about changing to a CAPTCHA that is more user-friendly, and hopefully one that isn't as English/Latin-alphabet-centric as the one we currently use. If Ugandan children and old Norwegian people, which all use the Latin alphabet, have such problems deciphering the CAPTCHA, what about people speaking languages that don't use the Latin alphabet? I would prefer something more simplistic, like some sort of math or image-based CAPTCHA, instead of the current CAPTCHA we use.

-- mvh Jon Harald Søby http://meta.wikimedia.org/wiki/User:Jon_Harald_S%C3%B8by _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Dan Garry

12:33 p.m.

On 6 November 2014 21:06, Tim Starling tstarling@wikimedia.org wrote:

...

I think it would be best if we just removed the captcha, rather than deploying a new engine.

I'd absolutely love that.

On the mobile app, almost everyone who tries to create an account is shown a captcha. Of those people, 31% of them get the captcha wrong on their first try, and 17% of those people give up trying to create an account. The success rate for account creation is less than 50%, in no small part due to the captcha.

I'd love to eliminate giving our users that headache.

Dan

-- Dan Garry Associate Product Manager, Mobile Apps Wikimedia Foundation

Pine W

1:39 p.m.

I'm interested both in improving our user stats and stamping out spambots. Dan, how do we know that those 17 percent were predominantly humans?

I've heard that automated captcha cracking is common. Perhaps so, but if taking away captchas increases the workload of stewards and admins, that situation presents a need for tradeoffs between making registration be easy for humans and making registration be difficult for bots.

Pine On Nov 6, 2014 9:34 PM, "Dan Garry" dgarry@wikimedia.org wrote:

...

On 6 November 2014 21:06, Tim Starling tstarling@wikimedia.org wrote:

...
I think it would be best if we just removed the captcha, rather than deploying a new engine.

I'd absolutely love that.

On the mobile app, almost everyone who tries to create an account is shown a captcha. Of those people, 31% of them get the captcha wrong on their first try, and 17% of those people give up trying to create an account. The success rate for account creation is less than 50%, in no small part due to the captcha.

I'd love to eliminate giving our users that headache.

Dan

-- Dan Garry Associate Product Manager, Mobile Apps Wikimedia Foundation _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Dan Garry

1:53 p.m.

On 6 November 2014 22:39, Pine W wiki.pine@gmail.com wrote:

...

I'm interested both in improving our user stats and stamping out spambots. Dan, how do we know that those 17 percent were predominantly humans?

We don't know for certain that they're human. That said, why would a spambot try to use the Wikipedia app? You're only creating extra hurdles for yourself by doing so. I've seen no evidence so far that anyone except humans use the Wikipedia app. Well, and Googlebot, but it reports itself to us using a special user agent and also only reads articles. :-)

Dan

-- Dan Garry Associate Product Manager, Mobile Apps Wikimedia Foundation

Pine W

2:22 p.m.

Good point. Perhaps there is a case to be made for a small-scale experiment of removing the CAPTCHA. I suggest consulting the stewards for their thoughts. We have at least one steward who is supposedly an expert on spambots, and his input may be valuable. You might also consult the admins of whichever community you would like to use for the home wiki of the experiment.

Pine On Nov 6, 2014 10:54 PM, "Dan Garry" dgarry@wikimedia.org wrote:

...

On 6 November 2014 22:39, Pine W wiki.pine@gmail.com wrote:

...
I'm interested both in improving our user stats and stamping out

spambots.

...
Dan, how do we know that those 17 percent were predominantly humans?

We don't know for certain that they're human. That said, why would a spambot try to use the Wikipedia app? You're only creating extra hurdles for yourself by doing so. I've seen no evidence so far that anyone except humans use the Wikipedia app. Well, and Googlebot, but it reports itself to us using a special user agent and also only reads articles. :-)

Dan

-- Dan Garry Associate Product Manager, Mobile Apps Wikimedia Foundation _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

svetlana

3:17 p.m.

On Fri, 7 Nov 2014, at 16:33, Dan Garry wrote:

...

On 6 November 2014 21:06, Tim Starling tstarling@wikimedia.org wrote:

...
I think it would be best if we just removed the captcha, rather than deploying a new engine.

I'd absolutely love that.

On the mobile app, almost everyone who tries to create an account is shown a captcha. Of those people, 31% of them get the captcha wrong on their first try, and 17% of those people give up trying to create an account. The success rate for account creation is less than 50%, in no small part due to the captcha.

I'd love to eliminate giving our users that headache.

Dan

I would suggest to ask on a village pump and alter the configuration per local consencus.

svetlana

Tim Starling

6:10 p.m.

On 07/11/14 19:17, svetlana wrote:

...

I would suggest to ask on a village pump and alter the configuration per local consencus.

We tried that before and the answer was "OMG no", even though nobody bothered to look at the logs. It turns out that the captcha we were using was broken from the outset -- trivially solvable with OCR -- but apparently that didn't affect anyone's opinion of whether or not it should be enabled.

According to reports from non-WMF users of ConfirmEdit, FancyCaptcha has little effect on spam rate. See the table here for a summary:

https://www.mediawiki.org/wiki/Extension:ConfirmEdit

Anyway, sure, let's ask on some more village pumps.

-- Tim Starling

Federico Leva (Nemo)

8 Nov 8 Nov

4:33 a.m.

+1 on Tim. FancyCaptcha is worse than useless.

Nemo

Brian Wolff

5:44 a.m.

On 11/7/14, Federico Leva (Nemo) nemowiki@gmail.com wrote:

...

+1 on Tim. FancyCaptcha is worse than useless.

Nemo

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Literally an anti-captcha. Letting bots in and keeping humans out.

Its like security through obscurity, minus the obscurity bit since its been publicly talked about how weak it is for at least 3 years now - https://www.elie.net/go/p22a

--bawolff

MZMcBride

5:19 a.m.

Tim Starling wrote:

...

On 07/11/14 19:17, svetlana wrote:

...
I would suggest to ask on a village pump and alter the configuration per local consencus.

We tried that before and the answer was "OMG no", even though nobody bothered to look at the logs. It turns out that the captcha we were using was broken from the outset -- trivially solvable with OCR -- but apparently that didn't affect anyone's opinion of whether or not it should be enabled.

According to reports from non-WMF users of ConfirmEdit, FancyCaptcha has little effect on spam rate. See the table here for a summary:

https://www.mediawiki.org/wiki/Extension:ConfirmEdit

Anyway, sure, let's ask on some more village pumps.

I think that's unfair.

Wikis have a serious spam problem. People associate CAPTCHAs with spam prevention. On the English Wikipedia, one of the actions that results in the user being required to successfully enter a CAPTCHA is adding an(y?) external link to a page as a newly registered user. This, of course, in addition to the CAPTCHA presented when registering an account (consider that many new account creations only come about as the result of the requirement for an account to make a new page on the English Wikipedia).

Why not disable the extension for a week and see what happens? If you're wrong and there's a marked increase in wiki spam (account creations and edits), then you can help devise a better solution than a CAPTCHA. CAPTCHAs are clearly not a sustainable fix to the underlying problems they were implemented to address, or so I think you're saying. If you're right and the CAPTCHA is simply ineffective and unneeded, we've eliminated some code from production and we can move on. In other words: what exactly is stopping you from disabling the extension?

MZMcBride

Eran Rosenthal

10:15 p.m.

+1 for disabling CAPTCHs (at least in signup form).

Anyway, sysops already have enougth tools for treating abuse by spam bots using AbuseFilter (e.g with rate filter).

On Sat, Nov 8, 2014 at 12:19 AM, MZMcBride z@mzmcbride.com wrote:

...

Tim Starling wrote:

...
On 07/11/14 19:17, svetlana wrote:

...
I would suggest to ask on a village pump and alter the configuration per local consencus.

We tried that before and the answer was "OMG no", even though nobody bothered to look at the logs. It turns out that the captcha we were using was broken from the outset -- trivially solvable with OCR -- but apparently that didn't affect anyone's opinion of whether or not it should be enabled.

According to reports from non-WMF users of ConfirmEdit, FancyCaptcha has little effect on spam rate. See the table here for a summary:

https://www.mediawiki.org/wiki/Extension:ConfirmEdit

Anyway, sure, let's ask on some more village pumps.

I think that's unfair.

Wikis have a serious spam problem. People associate CAPTCHAs with spam prevention. On the English Wikipedia, one of the actions that results in the user being required to successfully enter a CAPTCHA is adding an(y?) external link to a page as a newly registered user. This, of course, in addition to the CAPTCHA presented when registering an account (consider that many new account creations only come about as the result of the requirement for an account to make a new page on the English Wikipedia).

Why not disable the extension for a week and see what happens? If you're wrong and there's a marked increase in wiki spam (account creations and edits), then you can help devise a better solution than a CAPTCHA. CAPTCHAs are clearly not a sustainable fix to the underlying problems they were implemented to address, or so I think you're saying. If you're right and the CAPTCHA is simply ineffective and unneeded, we've eliminated some code from production and we can move on. In other words: what exactly is stopping you from disabling the extension?

MZMcBride

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Steven Walling

9 Nov 9 Nov

9:42 a.m.

On Fri, Nov 7, 2014 at 2:19 PM, MZMcBride z@mzmcbride.com wrote:

...

I think that's unfair.

Wikis have a serious spam problem. People associate CAPTCHAs with spam prevention. On the English Wikipedia, one of the actions that results in the user being required to successfully enter a CAPTCHA is adding an(y?) external link to a page as a newly registered user. This, of course, in addition to the CAPTCHA presented when registering an account (consider that many new account creations only come about as the result of the requirement for an account to make a new page on the English Wikipedia).

Why not disable the extension for a week and see what happens? If you're wrong and there's a marked increase in wiki spam (account creations and edits), then you can help devise a better solution than a CAPTCHA. CAPTCHAs are clearly not a sustainable fix to the underlying problems they were implemented to address, or so I think you're saying. If you're right and the CAPTCHA is simply ineffective and unneeded, we've eliminated some code from production and we can move on. In other words: what exactly is stopping you from disabling the extension?

When we did usability tests of the new version and the old version, it was exceedingly clear that what Jon has said is true: the CAPTCHA is by far the most painful part of our signup form. Results and videos of those tests at: https://www.mediawiki.org/wiki/Account_creation_user_experience/User_testing

Back in the day, when we did the last redesign of the signup form, we considered running an A/B test of removing the CAPTCHA. Docs for that are at https://meta.wikimedia.org/wiki/Research:Account_creation_UX/CAPTCHA

The "turn it off for a week and see what happens with spam" test is the easiest implementation of the above, and even though normally it's a great way to produce garbage data (not being a controlled experiment) I'd be in favor of trying it. Every indication we have is that we'd get marked increases in signup conversion rates (ours are not great -- only 30% of visitors complete the form successfully IIRC).

For some more background, when we proposed something like that to Chris Steipp he was pretty iffy about it, and he's not wrong. At other sites that don't have a CAPTCHA on signup (like Facebook, Quora, others) they avoid a spam problem in part because they require an email address and confirmation. For some irrational reason, even in the era of throwaway email accounts from web services, not requiring email is some kind of sacred cow among Wikimedians, even if it would make it an easy choice to throw away our wretched CAPTCHA.

If we want to avoid spam bots signing up, there is going to be a hit in ease of use somewhere. Our network of sites is just too large to avoid being a target. It's just a matter of testing to see how much we can reduce that hit, and which method might be less easy.

Brian Wolff

11:46 a.m.

...

For some more background, when we proposed something like that to Chris Steipp he was pretty iffy about it, and he's not wrong. At other sites

that

...

don't have a CAPTCHA on signup (like Facebook, Quora, others) they avoid a spam problem in part because they require an email address and confirmation. For some irrational reason, even in the era of throwaway email accounts from web services, not requiring email is some kind of sacred cow among Wikimedians, even if it would make it an easy choice to throw away our wretched CAPTCHA.

Because spam bots can't answer email?

...

If we want to avoid spam bots signing up, there is going to be a hit in ease of use somewhere. Our network of sites is just too large to avoid being a target. It's just a matter of testing to see how much we can

reduce

...

that hit, and which method might be less easy.

There is really no evidence that the current hit in ease of use has any affect on warding off spam bots.

--bawolff

Pine W

12:21 p.m.

Discussing an option with the community to test replacing registration CAPTCHAs with an email requirement makes sense to me. I would support a small, carefully designed test. If someone is motivated to create a Wikimedia account and they don't want to register an email, they can be given the option to have someone help them to set up an account via IRC, Facebook, or other communications methods.

I'd very much like to hear input from stewards who deal with cross-wiki spambots, so I suggest that the designers of the test consult with one or more stewards during the test design process.

Pine

Risker

2:09 p.m.

Umm. No. If ever you want major pushback from the broad international community, requiring any kind of "documentation" to open an account will probably work very well. I certainly would never have signed up for an account on Wikipedia if I'd had to supply an email address.

Risker/Anne

On 9 November 2014 00:21, Pine W wiki.pine@gmail.com wrote:

...

Discussing an option with the community to test replacing registration CAPTCHAs with an email requirement makes sense to me. I would support a small, carefully designed test. If someone is motivated to create a Wikimedia account and they don't want to register an email, they can be given the option to have someone help them to set up an account via IRC, Facebook, or other communications methods.

I'd very much like to hear input from stewards who deal with cross-wiki spambots, so I suggest that the designers of the test consult with one or more stewards during the test design process.

Pine _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Pine W

2:51 p.m.

We're talking about a test, not a broad rollout (:

I'm curious, Risker: if you don't mind my asking, what about being required to supply a throwaway email address would have discouraged you from opening a Wikimedia account?

Pine

aude

4:27 p.m.

On Sun, Nov 9, 2014 at 8:51 AM, Pine W wiki.pine@gmail.com wrote:

...

We're talking about a test, not a broad rollout (:

I'm curious, Risker: if you don't mind my asking, what about being required to supply a throwaway email address would have discouraged you from opening a Wikimedia account?

Having the email address attached to my account means that i can do password reset, which has been rather helpful on a number of occasions.

It's a bit sad for folks who don't provide email, that there's not much we can do to help them recover their account if they forget their password.

I understand that it can be a bit scary to have to provide this information (what about a throwaway?), but yet is a tradeoff.

Cheers, Katie

...

Pine _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- @wikimediadc / @wikidata

David Gerard

4:39 p.m.

On 9 November 2014 09:27, aude aude.wiki@gmail.com wrote:

...

On Sun, Nov 9, 2014 at 8:51 AM, Pine W wiki.pine@gmail.com wrote:

...

...
I'm curious, Risker: if you don't mind my asking, what about being required to supply a throwaway email address would have discouraged you from opening a Wikimedia account?

...

I understand that it can be a bit scary to have to provide this information (what about a throwaway?), but yet is a tradeoff.

I realise it'd be a pile of extra work, and I'm not sure how to present it clearly - but what about a choice between solving the captcha or supplying a working email address?

- d.

Brian Wolff

10:20 p.m.

On Nov 9, 2014 5:39 AM, "David Gerard" dgerard@gmail.com wrote:

...

On 9 November 2014 09:27, aude aude.wiki@gmail.com wrote:

...
On Sun, Nov 9, 2014 at 8:51 AM, Pine W wiki.pine@gmail.com wrote:

...
...
I'm curious, Risker: if you don't mind my asking, what about being

required

...

...
...
to supply a throwaway email address would have discouraged you from

opening

...

...
...
a Wikimedia account?

...
I understand that it can be a bit scary to have to provide this

information

...

...
(what about a throwaway?), but yet is a tradeoff.

I realise it'd be a pile of extra work, and I'm not sure how to present it clearly - but what about a choice between solving the captcha or supplying a working email address?

d.

Does anyone have any attack scenario that is remotely plausible which requiring a verified email would prevent?

--bawolff

Marc A. Pelletier

11:19 p.m.

On 11/09/2014 10:20 AM, Brian Wolff wrote:

...

Does anyone have any attack scenario that is remotely plausible which requiring a verified email would prevent?

Spambots (of which there are multitude, and that hammer any mediawiki site constantly) have gotten pretty good at bypassing captchas but have yet to respond properly to email loops (and that's a more complicated obstacle than first appears; throwaway accounts are cheap but any process that requires a delay - however small - means that spambot must now maintain state and interact rather than fire-and-forget).

I can tell you that on the (non-WMF) mediawiki installations I administer, requiring email confirmation before being able to edit reduced spambot editing by well over 95% without the number of spambots being significantly afftected (it's still quite visible by the bot /creating/ accounts).

But there is also a great heap of anecdotal data that shows that having to provide an email account increases the barrier of entry to users signing up. So, there's a tradeoff.

-- Marc

MZMcBride

10 Nov 10 Nov

12:33 a.m.

Marc A. Pelletier wrote:

...

But there is also a great heap of anecdotal data that shows that having to provide an email account increases the barrier of entry to users signing up. So, there's a tradeoff.

Eh, I think the anecdotal data (such as Facebook's and Google's hundreds of millions account registrations) suggests that e-mail confirmation is not a huge barrier to entry for legitimate users.

...

Spambots (of which there are multitude, and that hammer any mediawiki site constantly) have gotten pretty good at bypassing captchas but have yet to respond properly to email loops (and that's a more complicated obstacle than first appears; throwaway accounts are cheap but any process that requires a delay - however small - means that spambot must now maintain state and interact rather than fire-and-forget).

Hmmm, I imagine many spambots have already made this investment if they're dealing with popular systems that require e-mail address confirmation.

Wikimedia is different. You shouldn't even need an account to edit, much less an e-mail address. But this is a philosophical and principle-based (principled, if you will!) decision, not really a user experience or technical decision, in my opinion.

I think calling this issue a sacred cow is a bit overblown, but requiring an e-mail address would be a violation of our shared values. We strive to be as open and independent as possible and requiring an e-mail address is antithetical to that. If anything, we could provide e-mail address aliases (e.g., mzmcbride@en.wikipedia.org) for our users as a side benefit.

MZMcBride

Marc A. Pelletier

12:38 a.m.

On 11/09/2014 12:33 PM, MZMcBride wrote:

...

Hmmm, I imagine many spambots have already made this investment if they're dealing with popular systems that require e-mail address confirmation.

No doubt there are some that do; but it's a very different technical hurdle and the management tradeoff (and intrinsic delay) of an email loop makes huge volume fire-and-forget impossible and rate limits the amount of spamming that can sucessfuly be done - especially if duplicate email addresses are further limited.

I'm not saying this is a good solution in general - or that it's a panacea - but it's worth investigating.

-- Marc

Ricordisamoa

18 Mar 18 Mar

9:54 a.m.

Il 09/11/2014 18:33, MZMcBride ha scritto:

...

Marc A. Pelletier wrote:

...
But there is also a great heap of anecdotal data that shows that having to provide an email account increases the barrier of entry to users signing up. So, there's a tradeoff.

Eh, I think the anecdotal data (such as Facebook's and Google's hundreds of millions account registrations) suggests that e-mail confirmation is not a huge barrier to entry for legitimate users.

I think both Facebook and Google have enough staff resources to deal with spam, and they could even let bots create fake accounts as long as they don't harass other users, just to let the accounts counter increase. We can't afford that. CAPTCHA solutions (not necessarily text-based ones) should try to free our users from the day-to-day fight.

...

...
Spambots (of which there are multitude, and that hammer any mediawiki site constantly) have gotten pretty good at bypassing captchas but have yet to respond properly to email loops (and that's a more complicated obstacle than first appears; throwaway accounts are cheap but any process that requires a delay - however small - means that spambot must now maintain state and interact rather than fire-and-forget).

Hmmm, I imagine many spambots have already made this investment if they're dealing with popular systems that require e-mail address confirmation.

Wikimedia is different. You shouldn't even need an account to edit, much less an e-mail address. But this is a philosophical and principle-based (principled, if you will!) decision, not really a user experience or technical decision, in my opinion.

I think calling this issue a sacred cow is a bit overblown, but requiring an e-mail address would be a violation of our shared values. We strive to be as open and independent as possible and requiring an e-mail address is antithetical to that. If anything, we could provide e-mail address aliases (e.g.,mzmcbride@en.wikipedia.org) for our users as a side benefit.

What about case-sensitivity of user names vs email addresses then?

...

MZMcBride

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

MZMcBride

10:30 a.m.

Ricordisamoa wrote:

...

Il 09/11/2014 18:33, MZMcBride ha scritto:

...
Marc A. Pelletier wrote:

...
But there is also a great heap of anecdotal data that shows that having to provide an email account increases the barrier of entry to users signing up. So, there's a tradeoff.

Eh, I think the anecdotal data (such as Facebook's and Google's hundreds of millions account registrations) suggests that e-mail confirmation is not a huge barrier to entry for legitimate users.

I think both Facebook and Google have enough staff resources to deal with spam, and they could even let bots create fake accounts as long as they don't harass other users, just to let the accounts counter increase. We can't afford that.

I'm not sure what you mean by can't afford that. What specific behaviors are we trying to prevent? Account registration alone isn't really a problem on MediaWiki wikis, just as it isn't a problem on Facebook or Google. The system scales. But if the accounts are registering and then spamming (creating new pages, making bad edits to existing pages, etc.), that's a real problem that we should try to solve as efficiently and cleanly as possible. Volunteer time is definitely precious.

...

...
I think calling this issue a sacred cow is a bit overblown, but requiring an e-mail address would be a violation of our shared values. We strive to be as open and independent as possible and requiring an e-mail address is antithetical to that. If anything, we could provide e-mail address aliases (e.g., mzmcbride@en.wikipedia.org) for our users as a side benefit.

What about case-sensitivity of user names vs email addresses then?

This is tangential, but... we should fix usernames to be case-insensitive. And we should support login via e-mail address. And we should (properly) support a display name field, in my opinion. Hopefully, in time. :-)

In addition to better heuristics, as Robert suggested, we could also focus on tasks such as https://phabricator.wikimedia.org/T20110, maybe. Using AbuseFilter to trigger CAPTCHAs seems like it would either be a really great or a really terrible idea. At least making this functionality available as an option to potentially try seems worthwhile.

MZMcBride

Ricordisamoa

10:39 a.m.

Il 18/03/2015 04:30, MZMcBride ha scritto:

...

Ricordisamoa wrote:

...
Il 09/11/2014 18:33, MZMcBride ha scritto:

...
Marc A. Pelletier wrote:

...
But there is also a great heap of anecdotal data that shows that having to provide an email account increases the barrier of entry to users signing up. So, there's a tradeoff.

Eh, I think the anecdotal data (such as Facebook's and Google's hundreds of millions account registrations) suggests that e-mail confirmation is not a huge barrier to entry for legitimate users.

I think both Facebook and Google have enough staff resources to deal with spam, and they could even let bots create fake accounts as long as they don't harass other users, just to let the accounts counter increase. We can't afford that.

I'm not sure what you mean by can't afford that. What specific behaviors are we trying to prevent? Account registration alone isn't really a problem on MediaWiki wikis, just as it isn't a problem on Facebook or Google. The system scales. But if the accounts are registering and then spamming (creating new pages, making bad edits to existing pages, etc.), that's a real problem that we should try to solve as efficiently and cleanly as possible. Volunteer time is definitely precious.

If a bot creates 10,000 Facebook profiles and fills them with bogus content, that's fine for them. More users, more ads, more money. But if it creates 10,000 Wikimedia accounts with bogus user pages, it isn't fine for us. Less trust between Wikimedians.

...

...
...
I think calling this issue a sacred cow is a bit overblown, but requiring an e-mail address would be a violation of our shared values. We strive to be as open and independent as possible and requiring an e-mail address is antithetical to that. If anything, we could provide e-mail address aliases (e.g., mzmcbride@en.wikipedia.org) for our users as a side benefit.

What about case-sensitivity of user names vs email addresses then?

This is tangential, but... we should fix usernames to be case-insensitive. And we should support login via e-mail address. And we should (properly) support a display name field, in my opinion. Hopefully, in time. :-)

In addition to better heuristics, as Robert suggested, we could also focus on tasks such as https://phabricator.wikimedia.org/T20110, maybe. Using AbuseFilter to trigger CAPTCHAs seems like it would either be a really great or a really terrible idea. At least making this functionality available as an option to potentially try seems worthwhile.

Definitely.

...

MZMcBride

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Platonides

10 Nov 10 Nov

7:37 a.m.

On 09/11/14 17:19, Marc A. Pelletier wrote:

...

On 11/09/2014 10:20 AM, Brian Wolff wrote:

...
Does anyone have any attack scenario that is remotely plausible which requiring a verified email would prevent?

Spambots (of which there are multitude, and that hammer any mediawiki site constantly) have gotten pretty good at bypassing captchas but have yet to respond properly to email loops (and that's a more complicated obstacle than first appears; throwaway accounts are cheap but any process that requires a delay - however small - means that spambot must now maintain state and interact rather than fire-and-forget).

We have so far talked about spambots, but what about *vandals*?

We have a whole class of users interested in damaging/manipulating our projects. Some of them just want to create problems, while others have an agenda (eg. SEO). A number of them know how to program (even though they would probably not create a neural network to OCR our captcha!)

Removing the captcha also lowers the bar for an account creator bot, becoming very easy.

Given that a hundred of dormant wikipedia accounts are valuable, will $wgAccountCreationThrottle be enough to deter them? Is changing the IP every 6 accounts hard enough?

(Actually, you would also need not to raise sysop suspicions from the names you generate, but given the weird names people is already using...)

Pine W

8:37 a.m.

For purposes of account creation, I think of spambots and vandalbots as being in the same.

If our CAPTCHAs are deterring a significant percentage of humans while allowing a significant percentage of bots through, we should change something.

Pine On Nov 9, 2014 4:38 PM, "Platonides" platonides@gmail.com wrote:

...

On 09/11/14 17:19, Marc A. Pelletier wrote:

...
On 11/09/2014 10:20 AM, Brian Wolff wrote:

...
Does anyone have any attack scenario that is remotely plausible which requiring a verified email would prevent?

Spambots (of which there are multitude, and that hammer any mediawiki site constantly) have gotten pretty good at bypassing captchas but have yet to respond properly to email loops (and that's a more complicated obstacle than first appears; throwaway accounts are cheap but any process that requires a delay - however small - means that spambot must now maintain state and interact rather than fire-and-forget).

We have so far talked about spambots, but what about *vandals*?

We have a whole class of users interested in damaging/manipulating our projects. Some of them just want to create problems, while others have an agenda (eg. SEO). A number of them know how to program (even though they would probably not create a neural network to OCR our captcha!)

Removing the captcha also lowers the bar for an account creator bot, becoming very easy.

Given that a hundred of dormant wikipedia accounts are valuable, will $wgAccountCreationThrottle be enough to deter them? Is changing the IP every 6 accounts hard enough?

(Actually, you would also need not to raise sysop suspicions from the names you generate, but given the weird names people is already using...)

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Martijn Hoekstra

6:53 a.m.

On Nov 9, 2014 10:40 AM, "David Gerard" dgerard@gmail.com wrote:

...

On 9 November 2014 09:27, aude aude.wiki@gmail.com wrote:

...
On Sun, Nov 9, 2014 at 8:51 AM, Pine W wiki.pine@gmail.com wrote:

...
...
I'm curious, Risker: if you don't mind my asking, what about being

required

...

...
...
to supply a throwaway email address would have discouraged you from

opening

...

...
...
a Wikimedia account?

...
I understand that it can be a bit scary to have to provide this

information

...

...
(what about a throwaway?), but yet is a tradeoff.

I realise it'd be a pile of extra work, and I'm not sure how to present it clearly - but what about a choice between solving the captcha or supplying a working email address?

d.

That sounds pretty reasonable. I think it would be pretty user friendly and possibly allow us to collect data on which of the two is more used by spam bots.

--Martijn

Risker

10:40 a.m.

On 9 November 2014 02:51, Pine W wiki.pine@gmail.com wrote:

...

We're talking about a test, not a broad rollout (:

I'm curious, Risker: if you don't mind my asking, what about being required to supply a throwaway email address would have discouraged you from opening a Wikimedia account?

Pine

I'd done enough wandering around the project to know that: (a) There were then and still are today a non-negligible number of really creepy people who edit Wikipedia; given that the vast majority of registered editors have email enabled, it seemed obvious to me at the time that if I entered my email address, I'd probably get emails from other editors. (Looking at today's "create an account", it's still not obvious to me whether "email this user" is enabled by default or must be turned on.) (b) There was pretty well no data security at the time - people gained access based on the quality of their coding, privacy expectations weren't articulated, and there was no serious auditing taking place to identify leaks or inappropriate access and dissemination; this isn't a criticism of those who were working at the time (as paid or volunteer developers/operators), simply a reflection of limited resources and the necessity of limiting priorities. (c) There wasn't then, and isn't even today, any promise on the part of Wikipedia that they won't spam my mailbox. There are a few places today that I get massive spam from but my account with them will be automatically disabled if I unsubscribe from their mailings. For all I knew then, Wikipedia could easily be one of those places.

There aren't sufficient advantages to having a registered account to overcome the waste of time that comes with having a "verifiable" email address - and once again it's a rather western solution to a perceived problem.

On the other hand, I will agree with every one here about the quality of the CAPTCHAs. Having just played around with the 'create an account' page, I was only able to make out 3 of 10 CAPTCHAs; the rest were so blurry that at least one letter was illegible.

If account creation bots are able to bypass the CAPTCHAs we're not losing anything by disabling the CAPTCHA. And as someone who picks up on a fair number of spam accounts and account creations, I don't think they're the enormous problem that some make them out to be. In fact, they're probably a bigger problem on very small, lightly edited wikis compared to any of the big ones.

Risker/Anne

Platonides

6:45 a.m.

On 09/11/14 06:21, Pine W wrote:

...

Discussing an option with the community to test replacing registration CAPTCHAs with an email requirement makes sense to me. I would support a small, carefully designed test. If someone is motivated to create a Wikimedia account and they don't want to register an email, they can be given the option to have someone help them to set up an account via IRC, Facebook, or other communications methods.

It would be nice to have a link that leads the user to a #wikipedia-accountcreation channel for getting help creating an account. There are few ways to get thelp for an inexperienced user if they fail at the create account form. It is orthogonal however on whether we use a captcha or not.

Max Semenik

10:28 a.m.

I'm pretty sure most users technical enough to use IRC are able to solve captchas well.

On Sun, Nov 9, 2014 at 3:45 PM, Platonides platonides@gmail.com wrote:

...

On 09/11/14 06:21, Pine W wrote:

...
Discussing an option with the community to test replacing registration CAPTCHAs with an email requirement makes sense to me. I would support a small, carefully designed test. If someone is motivated to create a Wikimedia account and they don't want to register an email, they can be given the option to have someone help them to set up an account via IRC, Facebook, or other communications methods.

It would be nice to have a link that leads the user to a #wikipedia-accountcreation channel for getting help creating an account. There are few ways to get thelp for an inexperienced user if they fail at the create account form. It is orthogonal however on whether we use a captcha or not.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Best regards, Max Semenik ([[User:MaxSem]])

Platonides

12 Dec 12 Dec

5:26 a.m.

Max Semenik wrote:

...

I'm pretty sure most users technical enough to use IRC are able to solve captchas well.

That the backend is irc-based doesn't mean the would use a IRC frontend. We routinely point to web irc, and plenty of noobs have proven able to reach there (sometimes even thinking we are $Company since, after all, who else could you be talking to after following a Contact link on a [[$Company]] page at wikipedia.org?)

Platonides

10 Nov 10 Nov

7:28 a.m.

On 07/11/14 02:52, Jon Harald Søby wrote:

...

The main concern is obviously that it is really hard to read, but there are also some other issues, namely that all the fields in the user registration form (except for the username) are wiped if you enter the CAPTCHA incorrectly. So when you make a mistake, not only do you have to re-type a whole new CAPTCHA (where you may make another mistake), you also have to re-type the password twice *and* your e-mail address. This takes a long time, especially if you're not a fast typer (which was the case for the first group), or if you are on a tablet or phone (which was the case for some in the second group).

Only the password fields are cleared (in addition to the captcha). It is debatable if clearing them is the right thing or not, there must be some papers talking about that. But I think we could go with keeping them filled with the user password.

Another idea I am liking is to place the captcha at a different page (as a second step), where we could offer several options (captchas, puzzles, irc chat, email…) to confirm them, then gather their success rate.

Chris Steipp

11:23 p.m.

On Sunday, November 9, 2014, Platonides platonides@gmail.com wrote:

...

On 07/11/14 02:52, Jon Harald Søby wrote:

...
The main concern is obviously that it is really hard to read, but there are also some other issues, namely that all the fields in the user registration form (except for the username) are wiped if you enter the CAPTCHA incorrectly. So when you make a mistake, not only do you have to re-type a whole new CAPTCHA (where you may make another mistake), you also have to re-type the password twice *and* your e-mail address. This takes a long time, especially if you're not a fast typer (which was the case for the first group), or if you are on a tablet or phone (which was the case for some in the second group).

Only the password fields are cleared (in addition to the captcha). It is debatable if clearing them is the right thing or not, there must be some papers talking about that. But I think we could go with keeping them filled with the user password.

Another idea I am liking is to place the captcha at a different page (as a second step), where we could offer several options (captchas, puzzles, irc chat, email...) to confirm them, then gather their success rate.

I like both of these ideas.

On the general topic, I think either a captcha or verifying an email makes a small barrier to building a bot, but it's significant enough that it keeps the amateur bots out. I'd be very interested in seeing an experiment run to see what the exact impact is though.

Google had a great blog post on this subject where they made recaptcha easier to solve, and instead,

"The updated system uses advanced risk analysis techniques, actively considering the user's entire engagement with the CAPTCHA--before, during and after they interact with it. That means that today the distorted letters serve less as a test of humanity and more as a medium of engagement to elicit a broad range of cues that characterize humans and bots. " [1]

So spending time on a new engine that allows for environmental feedback from the system solving the captcha, and that lets us tune lots of things besides did the "user" sending back the right string of letters, I think would be well worth our time.

[1] - http://googleonlinesecurity.blogspot.com/2013/10/recaptcha-just-got-easier-b...

...

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

florian.schmidt.welzow＠t-online.de

3 Dec 3 Dec

11:27 p.m.

Hello,

i don't know, if we already had this in this discussion round, but maybe we can learn from Googles new reCaptcha noCaptcha[1][2]

[1] http://www.theverge.com/2014/12/3/7325925/google-is-killing-captcha-as-we-kn... [2] https://developers.google.com/recaptcha/

Freundliche Grüße / Kind regards Florian Schmidt

-----Original-Nachricht----- Betreff: Re: [Wikitech-l] Our CAPTCHA is very unfriendly Datum: Mon, 10 Nov 2014 17:23:46 +0100 Von: Chris Steipp csteipp@wikimedia.org An: Wikimedia developers wikitech-l@lists.wikimedia.org

On Sunday, November 9, 2014, Platonides platonides@gmail.com wrote:

...

On 07/11/14 02:52, Jon Harald Søby wrote:

...
The main concern is obviously that it is really hard to read, but there are also some other issues, namely that all the fields in the user registration form (except for the username) are wiped if you enter the CAPTCHA incorrectly. So when you make a mistake, not only do you have to re-type a whole new CAPTCHA (where you may make another mistake), you also have to re-type the password twice *and* your e-mail address. This takes a long time, especially if you're not a fast typer (which was the case for the first group), or if you are on a tablet or phone (which was the case for some in the second group).

Only the password fields are cleared (in addition to the captcha). It is debatable if clearing them is the right thing or not, there must be some papers talking about that. But I think we could go with keeping them filled with the user password.

Another idea I am liking is to place the captcha at a different page (as a second step), where we could offer several options (captchas, puzzles, irc chat, email...) to confirm them, then gather their success rate.

I like both of these ideas.

Google had a great blog post on this subject where they made recaptcha easier to solve, and instead,

[1] - http://googleonlinesecurity.blogspot.com/2013/10/recaptcha-just-got-easier-b...

...

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

svetlana

4 Dec 4 Dec

4:27 a.m.

I like these thoughts: - https://lists.wikimedia.org/pipermail/wikitech-l/2014-November/079340.html "Literally an anti-captcha. Letting bots in and keeping humans out." - https://lists.wikimedia.org/pipermail/wikitech-l/2014-November/079346.html "Why not disable the ConfirmEdit extension for a week and see what happens?"

Shoulw we go to a wiki and see if we can gain local consensus on such move? Which wiki would be better, a bigger one or a smaller one, for a start?

-- svetlana

Chad

4:53 a.m.

On Wed Dec 03 2014 at 1:27:33 PM svetlana svetlana@fastmail.com.au wrote:

...

I like these thoughts:

https://lists.wikimedia.org/pipermail/wikitech-l/2014-

November/079340.html "Literally an anti-captcha. Letting bots in and keeping humans out."

https://lists.wikimedia.org/pipermail/wikitech-l/2014-

November/079346.html "Why not disable the ConfirmEdit extension for a week and see what happens?"

Shoulw we go to a wiki and see if we can gain local consensus on such move? Which wiki would be better, a bigger one or a smaller one, for a start?

mw.org? Find one or two other devs and you've got consensus :)

-Chad

Ryan Kaldari

6:08 a.m.

The main reason our captcha is easy for bots to bypass isn't because it's easy to read (it's not); it's because it works the same way as 90% of other captcha's on the internet. So if you're a spam-bot writer, all you have to do is download one of the dozens of generic captcha-breaking programs on the internet and point it at Wikipedia. I imagine about half of them would be able to break our captcha out of the box. We could deter the majority of spam-bots just by having a unique type of captcha. And it doesn't have to be one that is difficult for humans to solve, just one that would require a halfway significant investment for a programmer to solve. In other words, I don't think that making our captcha easier necessarily means getting more spambots. We just have to jump out of the existing captcha design band-wagon. Here are some ideas:

1. Show an animated GIF of a scrolling marquee that reveals a word 2. Show an animated GIF that sequentially unblurs the letters in the captcha (from an initially totally blurred state) and then reblurs them in a cycle 3. Show three words in different shades of grey and ask the user to type the darkest word (too trivial?)

Surely we can come up with a creative idea that is: * Easy for humans to solve * Can't be solved by out-of-the-box captcha breakers * Isn't trivial for programmers to solve

Kaldari

On Wed, Dec 3, 2014 at 1:53 PM, Chad innocentkiller@gmail.com wrote:

...

On Wed Dec 03 2014 at 1:27:33 PM svetlana svetlana@fastmail.com.au wrote:

...
I like these thoughts:

https://lists.wikimedia.org/pipermail/wikitech-l/2014-

November/079340.html "Literally an anti-captcha. Letting bots in and keeping humans out."

https://lists.wikimedia.org/pipermail/wikitech-l/2014-

November/079346.html "Why not disable the ConfirmEdit extension for a week and see what happens?"

Shoulw we go to a wiki and see if we can gain local consensus on such move? Which wiki would be better, a bigger one or a smaller one, for a start?

mw.org? Find one or two other devs and you've got consensus :)

-Chad _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

David Gerard

6:14 a.m.

On 3 December 2014 at 23:08, Ryan Kaldari rkaldari@wikimedia.org wrote:

...

Surely we can come up with a creative idea that is:

Easy for humans to solve

Can't be solved by out-of-the-box captcha breakers

Isn't trivial for programmers to solve

* isn't an abomination for accessibility

- d.

svetlana

6:27 a.m.

I like how my message to try abandoning captcha entirely came up with a myriad of complaints how we can be smart, enable new captcha which is unique, etc.

Let's measure the impact.

Could someone kindly please do some metrics on these cases after a user opened an edit box: - edit saved without a captcha - edit attempted to save, faced with captcha, didn't bother entering it and gave up - edit attempted to save, faced with captcha, attempted to save again without entering it, gave up - edit attempted to save, faced with captcha, attempted it once, succeeded - edit attempted to save, faced with captcha, attempted it once, failed, gave up - edit attempted to save, faced with captcha, attempted it twice, got it right on second try, saved the edit - ... (x3, x4, x5)

As I heard there were some "A/B" tests and such, and I imagine they could be capable of getting this information.

-- svetlana On Thu, 4 Dec 2014, at 10:14, David Gerard wrote: > On 3 December 2014 at 23:08, Ryan Kaldari rkaldari@wikimedia.org wrote: > > > Surely we can come up with a creative idea that is: > > * Easy for humans to solve > > * Can't be solved by out-of-the-box captcha breakers > > * Isn't trivial for programmers to solve > > > * isn't an abomination for accessibility > > > - d. > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l

MZMcBride

11:02 a.m.

svetlana wrote:

...

I like how my message to try abandoning captcha entirely came up with a myriad of complaints how we can be smart, enable new captcha which is unique, etc.

Let's measure the impact.

We disabled the CAPTCHA entirely on test.wikipedia.org a few weeks ago. The wiki seems to be about the same. It probably makes sense to continue slowly disabling the CAPTCHA on wikis until users start to shout. Perhaps we'll disable the CAPTCHA on the rest of the phase 0 wikis next?

MZMcBride

svetlana

11:11 a.m.

Hi,

On Thu, 4 Dec 2014, at 15:02, MZMcBride wrote:

...

svetlana wrote:

...
I like how my message to try abandoning captcha entirely came up with a myriad of complaints how we can be smart, enable new captcha which is unique, etc.

Let's measure the impact.

We disabled the CAPTCHA entirely on test.wikipedia.org a few weeks ago. The wiki seems to be about the same. It probably makes sense to continue slowly disabling the CAPTCHA on wikis until users start to shout. Perhaps we'll disable the CAPTCHA on the rest of the phase 0 wikis next?

MZMcBride

It would be nice. What are these wikis? Can we ask them for permission on their village pumps?

-- svetlana

MZMcBride

11:18 a.m.

svetlana wrote:

...

On Thu, 4 Dec 2014, at 15:02, MZMcBride wrote:

...
We disabled the CAPTCHA entirely on test.wikipedia.org a few weeks ago. The wiki seems to be about the same. It probably makes sense to continue slowly disabling the CAPTCHA on wikis until users start to shout. Perhaps we'll disable the CAPTCHA on the rest of the phase 0 wikis next?

It would be nice. What are these wikis? Can we ask them for permission on their village pumps?

Err, group 0, not phase 0!

https://noc.wikimedia.org/conf/highlight.php?file=group0.dblist is the list. I think we can convince the various affected communities. :-)

MZMcBride

Chad

12:15 p.m.

On Wed Dec 03 2014 at 8:18:53 PM MZMcBride z@mzmcbride.com wrote:

...

svetlana wrote:

...
On Thu, 4 Dec 2014, at 15:02, MZMcBride wrote:

...
We disabled the CAPTCHA entirely on test.wikipedia.org a few weeks ago. The wiki seems to be about the same. It probably makes sense to continue slowly disabling the CAPTCHA on wikis until users start to shout. Perhaps we'll disable the CAPTCHA on the rest of the phase 0 wikis next?

It would be nice. What are these wikis? Can we ask them for permission on their village pumps?

Err, group 0, not phase 0!

https://noc.wikimedia.org/conf/highlight.php?file=group0.dblist is the list. I think we can convince the various affected communities. :-)

Patch up for review: https://gerrit.wikimedia.org/r/#/c/177494/

-Chad

Chris Steipp

5 Dec 5 Dec

5:45 a.m.

On Wed, Dec 3, 2014 at 9:15 PM, Chad innocentkiller@gmail.com wrote:

...

On Wed Dec 03 2014 at 8:18:53 PM MZMcBride z@mzmcbride.com wrote:

...
svetlana wrote:

...
On Thu, 4 Dec 2014, at 15:02, MZMcBride wrote:

...
We disabled the CAPTCHA entirely on test.wikipedia.org a few weeks ago. The wiki seems to be about the same. It probably makes sense to continue slowly disabling the CAPTCHA on wikis until users start to shout. Perhaps we'll disable the CAPTCHA on the rest of the phase 0 wikis next?

It would be nice. What are these wikis? Can we ask them for permission on their village pumps?

Err, group 0, not phase 0!

https://noc.wikimedia.org/conf/highlight.php?file=group0.dblist is the list. I think we can convince the various affected communities. :-)

Patch up for review: https://gerrit.wikimedia.org/r/#/c/177494/

Do we have metrics for the effect the change has had on testwiki, before we roll this out further? Number of accounts registered, number of those who are later blocked for spam, number of edits reverted as spam, etc?

With SUL, disabling captcha for account creation on one wiki effectively means we have disabled it on all wikis (create account on testwiki, visit enwiki, you're autocreated, edit!).

I definitely support experimenting, I just want to make sure we're collecting and watching the right numbers while we experiment.

...

-Chad _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Chad

8:20 a.m.

On Thu Dec 04 2014 at 2:45:39 PM Chris Steipp csteipp@wikimedia.org wrote:

...

On Wed, Dec 3, 2014 at 9:15 PM, Chad innocentkiller@gmail.com wrote:

...
On Wed Dec 03 2014 at 8:18:53 PM MZMcBride z@mzmcbride.com wrote:

...
svetlana wrote:

...
On Thu, 4 Dec 2014, at 15:02, MZMcBride wrote:

...
We disabled the CAPTCHA entirely on test.wikipedia.org a few weeks

ago.

...
...
...
...
The wiki seems to be about the same. It probably makes sense to

continue

...
...
...
...
slowly disabling the CAPTCHA on wikis until users start to shout. Perhaps we'll disable the CAPTCHA on the rest of the phase 0 wikis

next?

...
...
...
It would be nice. What are these wikis? Can we ask them for permission

on

...
...
...
their village pumps?

Err, group 0, not phase 0!

https://noc.wikimedia.org/conf/highlight.php?file=group0.dblist is

the

...
...
list. I think we can convince the various affected communities. :-)

Patch up for review: https://gerrit.wikimedia.org/r/#/c/177494/

Do we have metrics for the effect the change has had on testwiki, before we roll this out further? Number of accounts registered, number of those who are later blocked for spam, number of edits reverted as spam, etc?

With SUL, disabling captcha for account creation on one wiki effectively means we have disabled it on all wikis (create account on testwiki, visit enwiki, you're autocreated, edit!).

I definitely support experimenting, I just want to make sure we're collecting and watching the right numbers while we experiment.

Well that was a fun experiment for an hour. Turns out captchas do actually stop a non-zero amount of spam on non-test wikis.

Mediawiki.org logs tell the story pretty clearly.

This has been rolled back.

-Chad

MZMcBride

11:08 a.m.

Chad wrote:

...

Well that was a fun experiment for an hour. Turns out captchas do actually stop a non-zero amount of spam on non-test wikis.

Mediawiki.org logs tell the story pretty clearly.

This has been rolled back.

:-(

https://www.mediawiki.org/wiki/Special:Log/delete

I spent a bit of time poking around. The spam seems to primarily be related to page creation. A slightly smarter heuristic (such as requiring that your edit count be > 0 before you can create a page) might mitigate this. Disallowing edits that contain "<a href" might also help.

The more I think about this, the more I wonder whether we should change the CAPTCHA model so that instead of applying CAPTCHAs in a blanket manner to types of actions (page creation, account creation, etc.), we could instead only force users to solve a CAPTCHA when certain abuse filters are triggered. Adding this functionality to the AbuseFilter extension is tracked at https://phabricator.wikimedia.org/T20110.

By shifting from the current rigid PHP configuration model to a looser and more flexible AbuseFilter model, we could hopefully ensure that anti-abuse measures (warning about an action, disallowing an action, or requiring a CAPTCHA before allowing an action) are more narrowly tailored to address specific problematic behavior. Even triggering AbuseFilter warnings that simply add an extra click/form submission for specific patterns of problematic behavior might trip up many of these spambots.

Robert, let me know if you want access on mediawiki.org to look at the deleted edits, though they're quite boring.

MZMcBride

Robert Rohde

12:28 p.m.

On Thu, Dec 4, 2014 at 8:08 PM, MZMcBride z@mzmcbride.com wrote: <snip>

...

Robert, let me know if you want access on mediawiki.org to look at the deleted edits, though they're quite boring.

It wouldn't hurt to take a look. Though I suspect getting feedback from people who look at these things often would be more useful, at least until some sort of a large-scale study could be mounted.

I suspect that a lot of the spam are the obvious things such as external links to junk sites and repetitive promotional postings, though perhaps there are also less obvious types of spam?

I suspect we could weed out a lot of spammy link behavior by designing an external link classifier that used knowledge of what external links are frequently included and what external links are frequently removed to generate automatic good / suspect / bad ratings for new external links (or domains). Good links (e.g. NYTimes, CNN) might be automatically allowed for all users, suspect links (e.g. unknown or rarely used domains) might be automatically allowed for established users and challenged with captchas or other tools for new users / IPs, and bad links (i.e. those repeatedly spammed and removed) could be automatically detected and blocked.

-Robert Rohde

MZMcBride

10 Dec 10 Dec

10:33 a.m.

Robert Rohde wrote:

...

I suspect we could weed out a lot of spammy link behavior by designing an external link classifier that used knowledge of what external links are frequently included and what external links are frequently removed to generate automatic good / suspect / bad ratings for new external links (or domains). Good links (e.g. NYTimes, CNN) might be automatically allowed for all users, suspect links (e.g. unknown or rarely used domains) might be automatically allowed for established users and challenged with captchas or other tools for new users / IPs, and bad links (i.e. those repeatedly spammed and removed) could be automatically detected and blocked.

I really like this idea. I filed https://phabricator.wikimedia.org/T78113 to make sure it doesn't get lost.

MZMcBride

John Mark Vandenberg

5 Dec 5 Dec

12:43 p.m.

On Fri, Dec 5, 2014 at 11:08 AM, MZMcBride z@mzmcbride.com wrote:

...

Chad wrote:

...
Well that was a fun experiment for an hour. Turns out captchas do actually stop a non-zero amount of spam on non-test wikis.

Mediawiki.org logs tell the story pretty clearly.

This has been rolled back.

:-(

https://www.mediawiki.org/wiki/Special:Log/delete

I spent a bit of time poking around. The spam seems to primarily be related to page creation. A slightly smarter heuristic (such as requiring that your edit count be > 0 before you can create a page) might mitigate this. Disallowing edits that contain "<a href" might also help.

The more I think about this, the more I wonder whether we should change the CAPTCHA model so that instead of applying CAPTCHAs in a blanket manner to types of actions (page creation, account creation, etc.), we could instead only force users to solve a CAPTCHA when certain abuse filters are triggered. Adding this functionality to the AbuseFilter extension is tracked at https://phabricator.wikimedia.org/T20110.

By shifting from the current rigid PHP configuration model to a looser and more flexible AbuseFilter model, we could hopefully ensure that anti-abuse measures (warning about an action, disallowing an action, or requiring a CAPTCHA before allowing an action) are more narrowly tailored to address specific problematic behavior. Even triggering AbuseFilter warnings that simply add an extra click/form submission for specific patterns of problematic behavior might trip up many of these spambots.

Big +1 for this.

Then, if a wiki community really wants to use CAPTCHA on all account creations or link additions, they can add that as AF rules.

I am quite sure that on English Wikipedia the AF admins would enjoy defining precise rules and monitor them for effectiveness using the AF log.

-- John Vandenberg

Federico Leva (Nemo)

3:59 p.m.

20 minutes of test are really not enough. I didn't find a significant increase in spam: if I see correctly, some more inactive accounts were created and 2 userpages, which had not been previously created, were added by two IPs. Let's collect information on https://www.mediawiki.org/wiki/Extension:ConfirmEdit/FancyCaptcha_experiment... We can tweak the abusefilters and repeat the experiment another time.

Nemo

Steven Walling

4 Dec 4 Dec

11:17 a.m.

MZ: you mean removing just for account creation right? There is also a CAPTCHA delivered on external link addition for some editors–I think IPs and users not autoconfirmed. This is probably a lot more important for combating spam.

(Sorry for top posting. On my phone.) On Wed, Dec 3, 2014 at 8:03 PM MZMcBride z@mzmcbride.com wrote:

...

svetlana wrote:

...
I like how my message to try abandoning captcha entirely came up with a myriad of complaints how we can be smart, enable new captcha which is unique, etc.

Let's measure the impact.

We disabled the CAPTCHA entirely on test.wikipedia.org a few weeks ago. The wiki seems to be about the same. It probably makes sense to continue slowly disabling the CAPTCHA on wikis until users start to shout. Perhaps we'll disable the CAPTCHA on the rest of the phase 0 wikis next?

MZMcBride

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

MZMcBride

11:57 a.m.

Steven Walling wrote:

...

MZ: you mean removing just for account creation right? There is also a CAPTCHA delivered on external link addition for some editors–I think IPs and users not autoconfirmed. This is probably a lot more important for combating spam.

For testwiki, we actually set $wmgEnableCaptcha to false, which disabled both ConfirmEdit and FancyCaptcha entirely. My suspicion is that we need "enhanced spam protection" only on larger wikis and on a few smaller wikis that happen to be the target of spammers for whatever reason. Even on sites that are frequent spam targets, smarter heuristics, as Robert suggests, and existing tools such as AbuseFilter may be sufficient.

...

(Sorry for top posting. On my phone.)

No worries. I'm glad to see that you're still active here. :-)

MZMcBride

Steven Walling

1:35 p.m.

On Wed Dec 03 2014 at 8:57:38 PM MZMcBride z@mzmcbride.com wrote:

...

Steven Walling wrote:

...
MZ: you mean removing just for account creation right? There is also a CAPTCHA delivered on external link addition for some editors–I think IPs and users not autoconfirmed. This is probably a lot more important for combating spam.

For testwiki, we actually set $wmgEnableCaptcha to false, which disabled both ConfirmEdit and FancyCaptcha entirely. My suspicion is that we need "enhanced spam protection" only on larger wikis and on a few smaller wikis that happen to be the target of spammers for whatever reason. Even on sites that are frequent spam targets, smarter heuristics, as Robert suggests, and existing tools such as AbuseFilter may be sufficient.

Yeah I agree it doesn't matter on testwiki to turn it off 100%. I'd suggest we not do that for larger wikis though.

There are about a million IP edits a month on English Wikipedia alone, last time we checked.[1] If we increased anonymous bot spam by even only 1/10th of the total number of edits before we managed to put IP blocks in place, that's still 100k edits worth of spam. And we might not want to use something like Special:Nuke on an IP, since it may be shared with legit edits. However, for account creation, there were only ~21,000 registrations last month and it's automatically rate limited by IP even without CAPTCHA, so experimenting in that area is much safer.

1. https://meta.wikimedia.org/wiki/Research:Anonymous_editor_acquisition/Volume...

...

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

svetlana

5 Dec 5 Dec

7:16 a.m.

Steven Walling wrote:

...

There are about a million IP edits a month on English Wikipedia alone, last time we checked.[1] If we increased anonymous bot spam by even only 1/10th of the total number of edits before we managed to put IP blocks in place, that's still 100k edits worth of spam.

That's where AGF kicks in. The "anyone can edit" is sacred; I believe we should not touch it at the cost of deferring legitimate contributors. It's not that hard to put up a prominent "Please help us, we're drowning!" note on the wiki and increase the amount of people who review the edits tenfold.

-- svetlana

Comet styles

7:48 a.m.

Until they fix captcha or allow Global Filters to become truly global, there will always be a risk of spambots. For someone who deals with these on a regular basis and has been doing it for years, making the Captcha system more friendly to users also means making it more 'friendly' to spam bot masters...the only alternative is to make captchas more trickier or harder....As i mentioned a few times on IRC, every 3rd account created on wikimedia is a spambot and thats just accounts, not taking into account the thousands of spam edits by IP's daily..

-- Cometstyles

John Mark Vandenberg

9:07 a.m.

On Fri, Dec 5, 2014 at 7:48 AM, Comet styles cometstyles@gmail.com wrote:

...

Until they fix captcha or allow Global Filters to become truly global, there will always be a risk of spambots. For someone who deals with these on a regular basis and has been doing it for years, making the Captcha system more friendly to users also means making it more 'friendly' to spam bot masters...the only alternative is to make captchas more trickier or harder....As i mentioned a few times on IRC, every 3rd account created on wikimedia is a spambot and thats just accounts, not taking into account the thousands of spam edits by IP's daily..

Is there bugs raised for global filters?

Also, I suspect that relying on Abuse Filters wrt Wikibase sites is going to be a bad idea ; the abuse filters would need to parse the JSON...? Also it is a very large wiki by content, important for spammers because it can directly change the content of several wikis, but still a quite small wiki by dedicated maintenance team, so may be quickly overwhelmed if included in testing by devs. That also applies to the test.wikidata environment, except for the utility of spamming it, but spammers need a test site too.

-- John Vandenberg

Robert Rohde

9:17 a.m.

On Thu, Dec 4, 2014 at 4:48 PM, Comet styles cometstyles@gmail.com wrote:

...

Until they fix captcha or allow Global Filters to become truly global, there will always be a risk of spambots. For someone who deals with these on a regular basis and has been doing it for years, making the Captcha system more friendly to users also means making it more 'friendly' to spam bot masters...the only alternative is to make captchas more trickier or harder....As i mentioned a few times on IRC, every 3rd account created on wikimedia is a spambot and thats just accounts, not taking into account the thousands of spam edits by IP's daily..

Can you (or someone else) comment on what the common spambots are actually doing?

I'm generally wondering about whether there are opportunities to use captcha and/or other tools in different ways or on different behavior patterns that might capture most of the bots but affect far fewer legitimate editors.

-Robert Rohde

Helder .

8:24 a.m.

On Wed, Dec 3, 2014 at 9:27 PM, svetlana svetlana@fastmail.com.au wrote:

...

I like how my message to try abandoning captcha entirely came up with a myriad of complaints how we can be smart, enable new captcha which is unique, etc.

Let's measure the impact.

Could someone kindly please do some metrics on these cases after a user opened an edit box:

edit saved without a captcha

edit attempted to save, faced with captcha, didn't bother entering it and gave up

edit attempted to save, faced with captcha, attempted to save again without entering it, gave up

edit attempted to save, faced with captcha, attempted it once, succeeded

edit attempted to save, faced with captcha, attempted it once, failed, gave up

edit attempted to save, faced with captcha, attempted it twice, got it right on second try, saved the edit

... (x3, x4, x5)

As I heard there were some "A/B" tests and such, and I imagine they could be capable of getting this information.

-- svetlana

See for example: https://phabricator.wikimedia.org/T43522

Jay Ashworth

4 Dec 4 Dec

6:17 a.m.

----- Original Message -----

...

From: "Ryan Kaldari" rkaldari@wikimedia.org

...

spambots. We just have to jump out of the existing captcha design band-wagon. Here are some ideas:

...

Surely we can come up with a creative idea that is:

Easy for humans to solve

Can't be solved by out-of-the-box captcha breakers

Isn't trivial for programmers to solve

I would like to suggest a slight variant on something google's doing now, that I don't think they got quite right.

Pick half a dozen animals of which there are many specie variants, say, dog, cat, snake, bird, etc.

Show the user one such animal, and a 3x3 grid with 4 animals from that species, and 5 which are from the others, 64x64px ought to be plenty.

Ask them to click on all the animals of the same species. Assuming we can word that in a way that doesn't fail for people who don't know species are. Without losing the advantages of using pictures by *saying* "cat", that is.

The key here is to make the evaluation criteria sufficiently clear; the google implementation seems to require too much introspection into which characteristics of the pictured object they want you to match.

Once you've captcha'd then, you could, I would expect, put a cookie on the browser good for some amount of time and hashed against, say, the browser ID string or something so it's not portable?

Cheers, -- jra

-- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://www.bcp38.info 2000 Land Rover DII St Petersburg FL USA BCP38: Ask For It By Name! +1 727 647 1274

Risker

6:35 a.m.

It's a cool idea. Also not usable by those who are visually impaired, as best I can tell.

I'm going to be honest, I think svetlana may be on to something.

Risker/Anne

On 3 December 2014 at 18:17, Jay Ashworth jra@baylink.com wrote:

...

----- Original Message -----

...
From: "Ryan Kaldari" rkaldari@wikimedia.org

...
spambots. We just have to jump out of the existing captcha design band-wagon. Here are some ideas:

...
Surely we can come up with a creative idea that is:

Easy for humans to solve

Can't be solved by out-of-the-box captcha breakers

Isn't trivial for programmers to solve

I would like to suggest a slight variant on something google's doing now, that I don't think they got quite right.

Pick half a dozen animals of which there are many specie variants, say, dog, cat, snake, bird, etc.

Show the user one such animal, and a 3x3 grid with 4 animals from that species, and 5 which are from the others, 64x64px ought to be plenty.

Ask them to click on all the animals of the same species. Assuming we can word that in a way that doesn't fail for people who don't know species are. Without losing the advantages of using pictures by *saying* "cat", that is.

The key here is to make the evaluation criteria sufficiently clear; the google implementation seems to require too much introspection into which characteristics of the pictured object they want you to match.

Once you've captcha'd then, you could, I would expect, put a cookie on the browser good for some amount of time and hashed against, say, the browser ID string or something so it's not portable?

Cheers,

-- jra

Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://www.bcp38.info 2000 Land Rover DII St Petersburg FL USA BCP38: Ask For It By Name! +1 727 647 1274

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Tim Starling

9:25 a.m.

On 04/12/14 10:08, Ryan Kaldari wrote:

...

The main reason our captcha is easy for bots to bypass isn't because it's easy to read (it's not);

Actually, it's extremely easy to read. As I said in the commit message on I05b5bb6, I was able to break 66% of FancyCaptcha images with the open source OCR package Tesseract, with no preprocessing and minimal configuration.

The rest of your post stands, though.

-- Tim Starling

Robert Rohde

11:35 a.m.

We have many smart people, and undoubtedly we could design a better captcha.

However, no matter how smart the mousetrap, as long as you leave it strewn around the doors and hallways, well-meaning people are going to trip over it.

I would support removing the captcha from generic entry points, like the account registration page, where we know many harmless people are encountering it.

However, captchas might be useful if used in conjunction with simple behavioral analysis, such as rate limiters. For example, if an IP is creating a lot of accounts or editing at a high rate of speed, those are bad signs. Adding the same external link to multiple pages is often a very bad sign. However, adding a link to the NYTimes or CNN or an academic journal is probably fine. With that in mind, I would also eliminate the external link captcha in most cases where a link has only been added once and try to be more intelligent about which sites trigger it otherwise.

Basically, I'd advocate a strategy of adding a few heuristics to try and figure out who the mice are before putting the mousetraps in front of them. Of course, the biggest rats will still break the captcha and get through, but that is already true. Though reducing the prevalence of the captcha may increase the volume of spam by some small measure, I think it is more important that we stop erecting so many hurdles to new editors.

-Robert Rohde

On Wed, Dec 3, 2014 at 3:08 PM, Ryan Kaldari rkaldari@wikimedia.org wrote:

...

The main reason our captcha is easy for bots to bypass isn't because it's easy to read (it's not); it's because it works the same way as 90% of other captcha's on the internet. So if you're a spam-bot writer, all you have to do is download one of the dozens of generic captcha-breaking programs on the internet and point it at Wikipedia. I imagine about half of them would be able to break our captcha out of the box. We could deter the majority of spam-bots just by having a unique type of captcha. And it doesn't have to be one that is difficult for humans to solve, just one that would require a halfway significant investment for a programmer to solve. In other words, I don't think that making our captcha easier necessarily means getting more spambots. We just have to jump out of the existing captcha design band-wagon. Here are some ideas:

Show an animated GIF of a scrolling marquee that reveals a word

Show an animated GIF that sequentially unblurs the letters in the

captcha (from an initially totally blurred state) and then reblurs them in a cycle 3. Show three words in different shades of grey and ask the user to type the darkest word (too trivial?)

Surely we can come up with a creative idea that is:

Easy for humans to solve

Can't be solved by out-of-the-box captcha breakers

Isn't trivial for programmers to solve

Kaldari

On Wed, Dec 3, 2014 at 1:53 PM, Chad innocentkiller@gmail.com wrote:

...
On Wed Dec 03 2014 at 1:27:33 PM svetlana svetlana@fastmail.com.au wrote:

...
I like these thoughts:

https://lists.wikimedia.org/pipermail/wikitech-l/2014-

November/079340.html "Literally an anti-captcha. Letting bots in and keeping humans out."

https://lists.wikimedia.org/pipermail/wikitech-l/2014-

November/079346.html "Why not disable the ConfirmEdit extension for a week and see what happens?"

Shoulw we go to a wiki and see if we can gain local consensus on such move? Which wiki would be better, a bigger one or a smaller one, for a start?

mw.org? Find one or two other devs and you've got consensus :)

-Chad _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Daniel Friesen

11:47 a.m.

On 2014-12-03 8:35 PM, Robert Rohde wrote:

...

However, captchas might be useful if used in conjunction with simple behavioral analysis, such as rate limiters. For example, if an IP is creating a lot of accounts or editing at a high rate of speed, those are bad signs.

Don't we already do rate limiting by IP for account creation? In fact I seem to recall we have a page where people have to ask for temporary whitelisting of IPs like those used at a hackathon's Wi-Fi point where large numbers of users legitimately sign up.

I'm pretty sure the users making large amounts of malicious accounts use a bunch of proxies so they don't have to worry about rate limits.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]

Robert Rohde

11:55 a.m.

On Wed, Dec 3, 2014 at 8:47 PM, Daniel Friesen daniel@nadir-seen-fire.com wrote:

...

On 2014-12-03 8:35 PM, Robert Rohde wrote:

...
However, captchas might be useful if used in conjunction with simple behavioral analysis, such as rate limiters. For example, if an IP is creating a lot of accounts or editing at a high rate of speed, those are bad signs.

Don't we already do rate limiting by IP for account creation? In fact I seem to recall we have a page where people have to ask for temporary whitelisting of IPs like those used at a hackathon's Wi-Fi point where large numbers of users legitimately sign up.

I'm pretty sure the users making large amounts of malicious accounts use a bunch of proxies so they don't have to worry about rate limits.

Yes, we do have some rate limiting, though I couldn't tell you what the settings are presently. In general, we could have both a soft-limit that triggers captchas and a hard limit that results in a full stop. Depending on the settings, a two tiered system could even be more friendly for hackathons and teaching groups.

The broader point is that I would encourage people to consider ways to improve and expand the uses of similar basic behavioral analysis, rather than simply throwing a captcha at everyone.

-Robert Rohde

svetlana

11:52 a.m.

Hi Robert,

With accounts it is no problem, we can just make them enter a captcha when registering and assume they're human from now on.

However where an IP contributor enters a captcha, we can't assume it's human because IPs are often shared. There is lots of past discussion on distinguishing between IP users and authenticating them in some way, but they all hit various low ceilings (enter private browsing mode, etc) and the conclusion was to continue treating them as group of people, while we always have to allow for one of them being a spam bot. Therefore I can't see how to implement your suggestion without piping all logged out contributors into the "mice" pile. This is unacceptable.

-- svetlana On Thu, 4 Dec 2014, at 15:35, Robert Rohde wrote: > We have many smart people, and undoubtedly we could design a better captcha. > > However, no matter how smart the mousetrap, as long as you leave it strewn > around the doors and hallways, well-meaning people are going to trip over > it. > > I would support removing the captcha from generic entry points, like the > account registration page, where we know many harmless people are > encountering it. > > However, captchas might be useful if used in conjunction with simple > behavioral analysis, such as rate limiters. For example, if an IP is > creating a lot of accounts or editing at a high rate of speed, those are > bad signs. Adding the same external link to multiple pages is often a very > bad sign. However, adding a link to the NYTimes or CNN or an academic > journal is probably fine. With that in mind, I would also eliminate the > external link captcha in most cases where a link has only been added once > and try to be more intelligent about which sites trigger it otherwise. > > Basically, I'd advocate a strategy of adding a few heuristics to try and > figure out who the mice are before putting the mousetraps in front of > them. Of course, the biggest rats will still break the captcha and get > through, but that is already true. Though reducing the prevalence of the > captcha may increase the volume of spam by some small measure, I think it > is more important that we stop erecting so many hurdles to new editors. > > -Robert Rohde > > On Wed, Dec 3, 2014 at 3:08 PM, Ryan Kaldari rkaldari@wikimedia.org wrote: > > > The main reason our captcha is easy for bots to bypass isn't because it's > > easy to read (it's not); it's because it works the same way as 90% of other > > captcha's on the internet. So if you're a spam-bot writer, all you have to > > do is download one of the dozens of generic captcha-breaking programs on > > the internet and point it at Wikipedia. I imagine about half of them would > > be able to break our captcha out of the box. We could deter the majority of > > spam-bots just by having a unique type of captcha. And it doesn't have to > > be one that is difficult for humans to solve, just one that would require a > > halfway significant investment for a programmer to solve. In other words, I > > don't think that making our captcha easier necessarily means getting more > > spambots. We just have to jump out of the existing captcha design > > band-wagon. Here are some ideas: > > > > 1. Show an animated GIF of a scrolling marquee that reveals a word > > 2. Show an animated GIF that sequentially unblurs the letters in the > > captcha (from an initially totally blurred state) and then reblurs them in > > a cycle > > 3. Show three words in different shades of grey and ask the user to type > > the darkest word (too trivial?) > > > > Surely we can come up with a creative idea that is: > > * Easy for humans to solve > > * Can't be solved by out-of-the-box captcha breakers > > * Isn't trivial for programmers to solve > > > > Kaldari > > > > On Wed, Dec 3, 2014 at 1:53 PM, Chad innocentkiller@gmail.com wrote: > > > > > On Wed Dec 03 2014 at 1:27:33 PM svetlana svetlana@fastmail.com.au > > > wrote: > > > > > > > I like these thoughts: > > > > - https://lists.wikimedia.org/pipermail/wikitech-l/2014- > > > > November/079340.html "Literally an anti-captcha. Letting bots in and > > > > keeping humans out." > > > > - https://lists.wikimedia.org/pipermail/wikitech-l/2014- > > > > November/079346.html "Why not disable the ConfirmEdit extension for a > > > > week and see what happens?" > > > > > > > > Shoulw we go to a wiki and see if we can gain local consensus on such > > > > move? Which wiki would be better, a bigger one or a smaller one, for a > > > > start? > > > > > > > > > > > mw.org? Find one or two other devs and you've got consensus :) > > > > > > -Chad > > > _______________________________________________ > > > Wikitech-l mailing list > > > Wikitech-l@lists.wikimedia.org > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > > > > _______________________________________________ > > Wikitech-l mailing list > > Wikitech-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Ricordisamoa

18 Mar 18 Mar

10:02 a.m.

Il 10/11/2014 17:23, Chris Steipp ha scritto:

...

On the general topic, I think either a captcha or verifying an email makes a small barrier to building a bot, but it's significant enough that it keeps the amateur bots out. I'd be very interested in seeing an experiment run to see what the exact impact is though.

Google had a great blog post on this subject where they made recaptcha easier to solve, and instead,

"The updated system uses advanced risk analysis techniques, actively considering the user's entire engagement with the CAPTCHA--before, during and after they interact with it. That means that today the distorted letters serve less as a test of humanity and more as a medium of engagement to elicit a broad range of cues that characterize humans and bots. " [1]

So spending time on a new engine that allows for environmental feedback from the system solving the captcha, and that lets us tune lots of things besides did the "user" sending back the right string of letters, I think would be well worth our time.

[1] - http://googleonlinesecurity.blogspot.com/2013/10/recaptcha-just-got-easier-b...

Il 04/12/2014 05:35, Robert Rohde ha scritto:

...

We have many smart people, and undoubtedly we could design a better captcha.

However, no matter how smart the mousetrap, as long as you leave it strewn around the doors and hallways, well-meaning people are going to trip over it.

I would support removing the captcha from generic entry points, like the account registration page, where we know many harmless people are encountering it.

However, captchas might be useful if used in conjunction with simple behavioral analysis, such as rate limiters. For example, if an IP is creating a lot of accounts or editing at a high rate of speed, those are bad signs. Adding the same external link to multiple pages is often a very bad sign. However, adding a link to the NYTimes or CNN or an academic journal is probably fine. With that in mind, I would also eliminate the external link captcha in most cases where a link has only been added once and try to be more intelligent about which sites trigger it otherwise.

Basically, I'd advocate a strategy of adding a few heuristics to try and figure out who the mice are before putting the mousetraps in front of them. Of course, the biggest rats will still break the captcha and get through, but that is already true. Though reducing the prevalence of the captcha may increase the volume of spam by some small measure, I think it is more important that we stop erecting so many hurdles to new editors.

-Robert Rohde

Il 05/12/2014 06:28, Robert Rohde ha scritto:

...

I suspect that a lot of the spam are the obvious things such as external links to junk sites and repetitive promotional postings, though perhaps there are also less obvious types of spam?

I suspect we could weed out a lot of spammy link behavior by designing an external link classifier that used knowledge of what external links are frequently included and what external links are frequently removed to generate automatic good / suspect / bad ratings for new external links (or domains). Good links (e.g. NYTimes, CNN) might be automatically allowed for all users, suspect links (e.g. unknown or rarely used domains) might be automatically allowed for established users and challenged with captchas or other tools for new users / IPs, and bad links (i.e. those repeatedly spammed and removed) could be automatically detected and blocked.

-Robert Rohde

What about applying ClueBot NG's Vandalism Detection Algorithm https://en.wikipedia.org/wiki/User:ClueBot_NG#Vandalism_Detection_Algorithm to spam? At this point I think machine learning is the only way a real CAPTCHA can keep up with evil bots, and a text-based system (such as T34695 https://phabricator.wikimedia.org/T34695) would only be used for tuning, just as reCAPTCHA does.

3584

Age (days ago)

3715

Last active (days ago)

wikitech-l@lists.wikimedia.org

68 comments

28 participants

tags (0)

participants (28)

aude
Brian Wolff
Chad
Chris Steipp
Comet styles
Dan Garry
Daniel Friesen
David Gerard
Eran Rosenthal
Federico Leva (Nemo)
florian.schmidt.welzow＠t-online.de
Helder .
Jay Ashworth
John Mark Vandenberg
Jon Harald Søby
Marc A. Pelletier
Martijn Hoekstra
Max Semenik
MZMcBride
Pine W
Platonides
Ricordisamoa
Risker
Robert Rohde
Ryan Kaldari
Steven Walling
svetlana
Tim Starling