Minh Nguyen wrote:
Speaking of anglocentrism, I wonder if it'd be possible for the current captcha software to generate captchas in other languages. I've seen some generated captchas at the Vietnamese Wikipedia that would definitely confuse Vietnamese-speakers (can't remember the words exactly), because of things like r's and n's smooshed up right next to each other and stuff. The user might have to /guess/ because the English words really don't follow Vietnamese spelling rules.
An advantage to localizing the captchas would be that it might reduce the impact of spambots at non-English projects. As far as I know, there isn't yet a captcha-defeating bot that understands Vietnamese or Basque or Quechua.
By the way, I'm only proposing localizing for most languages that use the Latin alphabet, because requiring users to respond to a captcha in Thai or Arabic would exclude a lot of legitimate interwiki users. And users of other scripts tend to have the means of entering in Latin-based characters. Also, for languages that use diacritical marks, we could generate the words without the marks and modify [[MediaWiki:Captcha-createaccount]], asking the user to enter in the word without diacritical marks of any kind.
Indeed it would be possible: what would be needed would be a word list of about a thousand short words, for each language that needed its own captchas, since the captcha software uses these to build its challenge strings. It _might_ be possible to start with a set of common words in English, and to use Wiktionary to choose the nearest equivalents in each language.
However, I also think that it would be a good idea to have captchas in non-Latin scripts as well: presumably many Arabic or Thai readers have the same problems recognizing Latin characters that readers of Latin scripts would have with Arabic or Thai characters. We could always offer a Latin-script alternative as a fallback.
-- Neil