Captcha filter list

List overview All Threads
Download

newer

older

Is file location and file size...

Deployment highlights - week of...

George Herbert

1 Jan 2014 1 Jan '14

11:05 a.m.

Just got a report and screenshot that a new user got this string for their captcha on en.wikipedia "nigerblew"

http://snag.gy/JpSUR.jpg

Though several people are pointing out that Niger is a country, I think it's reasonable to try and avoid things close to the two-g version of that word; nobody's denigrated by avoiding possibly offensive if misinterpreted words. I recall there's a filter list?...

-- -george william herbert george.herbert@gmail.com

Show replies by date

Steven Walling

1 Jan 1 Jan

12:58 p.m.

On Tue, Dec 31, 2013 at 7:05 PM, George Herbert george.herbert@gmail.comwrote:

...

Just got a report and screenshot that a new user got this string for their captcha on en.wikipedia "nigerblew"

http://snag.gy/JpSUR.jpg

Though several people are pointing out that Niger is a country, I think it's reasonable to try and avoid things close to the two-g version of that word; nobody's denigrated by avoiding possibly offensive if misinterpreted words. I recall there's a filter list?...

I don't know if there's a filter list, but this has happened before. I've seen many of our past and current CAPTCHAs when we were testing the signup/login redesign. I recall seeing both "headshits" and "obamadick".

The CAPTCHA is two randomly generated English words. Personally, I think it might just be good to regenerate the CAPTCHAs entirely from time to time. AFAIK it's not that hard and our CAPTCHAs are weak anyway.

Steven

Benjamin Lees

1:27 p.m.

There's a blacklist that has been included with FancyCaptcha for a few months, although I don't know whether it's the same as the one the WMF uses.

See https://bugzilla.wikimedia.org/show_bug.cgi?id=21025 and the associated patches.

Tyler Romeo

1:30 p.m.

It's a CAPTCHA, not an article or piece of actual content. If people are actually getting offended by randomly generated CAPTCHAs I think they need to find something more worthwhile to complain about.

-- Tyler Romeo On Jan 1, 2014 12:27 AM, "Benjamin Lees" emufarmers@gmail.com wrote: > There's a blacklist that has been included with FancyCaptcha for a few > months, although I don't know whether it's the same as the one the WMF > uses. > > See https://bugzilla.wikimedia.org/show_bug.cgi?id=21025 and the > associated > patches. > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l

George Herbert

1:34 p.m.

Tyler, websites everywhere blacklist offensive words (and with some regularity, look and sound-alikes) from the random captcha generator...

I don't personally care, you don't, but if we offend people needlessly it's an oops. We need some elements of the site to meet Lowest Common Denominator rather than most enlightened participant.

That all said, it is possible the screenshot was a fake; the brand new user posted two things on-wiki; one a faked source for a BBC actress article, and two the screenshot for the captcha.

On-wiki conclusion was we were trolled. Not sure if that means the screenshot was a photoshop job or a real one, whether that was part of the troll or not. But the actual edit was a good solid troll.

On Tue, Dec 31, 2013 at 9:30 PM, Tyler Romeo tylerromeo@gmail.com wrote:

...

It's a CAPTCHA, not an article or piece of actual content. If people are actually getting offended by randomly generated CAPTCHAs I think they need to find something more worthwhile to complain about.

-- Tyler Romeo On Jan 1, 2014 12:27 AM, "Benjamin Lees" emufarmers@gmail.com wrote:

...
There's a blacklist that has been included with FancyCaptcha for a few months, although I don't know whether it's the same as the one the WMF uses.

See https://bugzilla.wikimedia.org/show_bug.cgi?id=21025 and the associated patches. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- -george william herbert george.herbert@gmail.com

Marc A. Pelletier

1:42 p.m.

On 01/01/2014 12:34 AM, George Herbert wrote:

...

Tyler, websites everywhere blacklist offensive words (and with some regularity, look and sound-alikes) from the random captcha generator...

Yes, and then we end up with the Scunthorpe problem instead.

I agree it's a little bit silly, and also a loosing proposition; even trying to filter for /actual/ cuss words is hard enough (because the list of word/fragments someone *might* find offensive is boundless); if we try to also block misspelling, lookalikes or cognates we might as well block /^[a-z]*$/.

-- Marc

Benjamin Lees

2:41 p.m.

On Wed, Jan 1, 2014 at 12:42 AM, Marc A. Pelletier marc@uberbox.org wrote:

...

Yes, and then we end up with the Scunthorpe problem instead.

The Scunthorpe problem is not actually a problem here, because we're just limiting the CAPTCHAs we serve to users, not filtering their input.

Tim Landscheidt

3 p.m.

"Marc A. Pelletier" marc@uberbox.org wrote:

...

...
Tyler, websites everywhere blacklist offensive words (and with some regularity, look and sound-alikes) from the random captcha generator...

...

Yes, and then we end up with the Scunthorpe problem instead.

...

I agree it's a little bit silly, and also a loosing proposition; even trying to filter for /actual/ cuss words is hard enough (because the list of word/fragments someone *might* find offensive is boundless); if we try to also block misspelling, lookalikes or cognates we might as well block /^[a-z]*$/.

Not only that, the selection of blacklisted words may be of- fensive itself. For example, https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FConfirmEdit/master/b... protects the Christian god with two entries, while Allah is up for ridicule. And it might even require Jews to type in the tetragrammaton and thus effectively ban them from a site.

IMHO if someone is offended by a captcha, they should click on reload.

Tim

Benjamin Lees

3:31 p.m.

On Wed, Jan 1, 2014 at 2:00 AM, Tim Landscheidt tim@tim-landscheidt.dewrote:

...

Not only that, the selection of blacklisted words may be of- fensive itself.

That doesn't seem like a problem, since the list isn't visible in the user interface. Users will not be complaining that they aren't receiving "niggardly", "cocky", and "cumin" in CAPTCHAs.

For example,

...

https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FConfirmEdit/master/b... protects the Christian god with two entries, while Allah is up for ridicule. And it might even require Jews to type in the tetragrammaton and thus effectively ban them from a site.

I checked out the registration form for a white supremacist forum, and they just use reCAPTCHA. No doubt they'll be developing a CAPTJCA or CAPTMCA soon enough.

There are likely a number of strings that should be added to the default blacklist. Patches welcome!

Nathan Larson

4:20 p.m.

On Wed, Jan 1, 2014 at 2:31 AM, Benjamin Lees emufarmers@gmail.com wrote:

...

I checked out the registration form for a white supremacist forum, and they just use reCAPTCHA. No doubt they'll be developing a CAPTJCA or CAPTMCA soon enough.

There are likely a number of strings that should be added to the default blacklist. Patches welcome!

Yes, sites that want to screen out users with certain political, religious, etc. sensibilities could, with some small tweaks to the code, create a ConfirmEdit whitelist that causes *all* CAPTCHAs to contain words calculated to offend those groups. Actually, with QuestyCaptcha, this is already possible. One could include in $wgCaptchaQuestions answers that require the user to type in strings denigrating certain groups or deities; swearing allegiance to certain causes or entities; or stating that one has certain stigmatized attractions, preferences or desires or engages in certain stigmatized behaviors. One could also use it to screen for a certain level of intelligence, knowledge, etc. in a given field by asking questions only certain people would be able to answer. The possibilities are endless once one's main priority becomes to repel or screen out, rather than attract, most potential users.

Nathan Larson

4 Jan 4 Jan

5:42 p.m.

On Wed, Jan 1, 2014 at 2:31 AM, Benjamin Lees emufarmers@gmail.com wrote:

...

On Wed, Jan 1, 2014 at 2:00 AM, Tim Landscheidt <tim@tim-landscheidt.de

...
wrote:

I checked out the registration form for a white supremacist forum, and they just use reCAPTCHA. No doubt they'll be developing a CAPTJCA or CAPTMCA soon enough.

Conversely, one could use an empathy CAPTCHA to try to screen out that type of crowd (although it might merely screen out the politically incorrect). http://www.wired.com/threatlevel/2012/10/empathy-captcha/

3853

Age (days ago)

3856

Last active (days ago)

wikitech-l@lists.wikimedia.org

10 comments

7 participants

tags (0)

participants (7)

Benjamin Lees
George Herbert
Marc A. Pelletier
Nathan Larson
Steven Walling
Tim Landscheidt
Tyler Romeo