I've just opened a new account on the Danish Wikipedia.
I was asked to read the following Captcha :
http://da.wikipedia.org/w/index.php?title=Speciel:Captcha/image&wpCaptch...
I provided the following answer : "shipsneeds" and it was accepted. But I must confess that reading the second "e" and the "d" was difficult for me.
I was anguished, because I feared that if I answered a wrong answer, I would not be given a second chance for some time.
I don't see why both "e"s should have a different look. Because they looked different, I was puzzled during a long time before eventually choosing to type a second "e". The vertical bar of the "d" being extremely short in comparison with that of the "h" and that of the "p", I was wondering if the last character could not be a square looking "o" or a manual script "a".
I had all these difficulties although I am among the advanced readers and speakers of English. I Know that "ee" is an often found character combination in English. I could also recognize the words "ships" and "needs".
What about non-English speakers ?
Should we not have Japanese-based, Malayalam-based (there is a lot of talk nowadays in having Wikipedia growing in India), etc. captchas ?
For the time being, while the captcha is English-based, how about adding a button with "let me try another captcha" for people experiencing a captcha that is very difficult to read ?
I hate the case that I'm asked with a Chinese captcha when I'm surfing some Chinese websites without IME available.
Besides I don't prefer Chinese captchas personally because Chinese characters usually require more key hits.
On 2/5/11, Teofilo teofilowiki@gmail.com wrote:
I've just opened a new account on the Danish Wikipedia.
I was asked to read the following Captcha :
http://da.wikipedia.org/w/index.php?title=Speciel:Captcha/image&wpCaptch...
I provided the following answer : "shipsneeds" and it was accepted. But I must confess that reading the second "e" and the "d" was difficult for me.
I was anguished, because I feared that if I answered a wrong answer, I would not be given a second chance for some time.
I don't see why both "e"s should have a different look. Because they looked different, I was puzzled during a long time before eventually choosing to type a second "e". The vertical bar of the "d" being extremely short in comparison with that of the "h" and that of the "p", I was wondering if the last character could not be a square looking "o" or a manual script "a".
I had all these difficulties although I am among the advanced readers and speakers of English. I Know that "ee" is an often found character combination in English. I could also recognize the words "ships" and "needs".
What about non-English speakers ?
Should we not have Japanese-based, Malayalam-based (there is a lot of talk nowadays in having Wikipedia growing in India), etc. captchas ?
For the time being, while the captcha is English-based, how about adding a button with "let me try another captcha" for people experiencing a captcha that is very difficult to read ?
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I have just checked how they do at Baidu Baike (the well kown Chinese online encyclopedia) :
http://baike.baidu.com/page/userlogin.html#reg
Their captcha is a set of 4 characters : either arabic numbers or latin capital letters. It looks easier than our current mediawiki captcha, and they have a "You can't see?" button providing another try.
2011/2/5 Liangent liangent@gmail.com:
I hate the case that I'm asked with a Chinese captcha when I'm surfing some Chinese websites without IME available.
Besides I don't prefer Chinese captchas personally because Chinese characters usually require more key hits.
Teofilo wrote:
Should we not have Japanese-based, Malayalam-based (there is a lot of talk nowadays in having Wikipedia growing in India), etc. captchas ?
This is the subject of bug 5309, "Localize captcha images": https://bugzilla.wikimedia.org/show_bug.cgi?id=5309
For the time being, while the captcha is English-based, how about adding a button with "let me try another captcha" for people experiencing a captcha that is very difficult to read ?
This is the subject of bug 14230, "Add a button to request a new fancy captcha (code)": https://bugzilla.wikimedia.org/show_bug.cgi?id=14230
Generally it's a good idea to search Bugzilla before mailing this list. More often than not, Bugzilla will contain the relevant problem and a discussion of it.
While I sympathize with non-English speakers, I must confess that it's been quite a while since I filled out a CAPTCHA on a Wikimedia wiki. Surely most users use unified login, requiring a CAPTCHA to only be filled out once for all Wikimedia wikis?
MZMcBride
On 5 February 2011 05:19, MZMcBride z@mzmcbride.com wrote:
This is the subject of bug 5309, "Localize captcha images": https://bugzilla.wikimedia.org/show_bug.cgi?id=5309 This is the subject of bug 14230, "Add a button to request a new fancy captcha (code)": https://bugzilla.wikimedia.org/show_bug.cgi?id=14230 Generally it's a good idea to search Bugzilla before mailing this list. More often than not, Bugzilla will contain the relevant problem and a discussion of it.
This comes across as dismissive. Saying "we have old bugs filed that no-one is working on" is not a reason to dismiss discussion of a real problem. Tim has noted how badly our captcha solutions suck.
(It's a real pity reCaptcha is third-party and proprietary.)
- d.
2011/2/5 David Gerard dgerard@gmail.com
(It's a real pity reCaptcha is third-party and proprietary.)
Well, we it.source fellow are writing our communication about (it will be published into wikisource-l), but a brief mention to good news is mandatory here.
We have a simple script that extracts word images, corresponding to doubtful OCR interpretation, from any djvu file with a text layer; scripts to upload into djvu layer again fixed words are simple too.
We posted first communication into John Vandenberg en.source user page, and a wikicaptcha is now something possible. See John's lalk here: http://en.wikisource.org/wiki/User_talk:John_Vandenberg#reCAPTCHA_for_source
Alex brollo
Isn't this a nice gsoc 2011 project? Best Diederik
Sent from my iPhone
On 2011-02-05, at 2:40, David Gerard dgerard@gmail.com wrote:
On 5 February 2011 05:19, MZMcBride z@mzmcbride.com wrote:
This is the subject of bug 5309, "Localize captcha images": https://bugzilla.wikimedia.org/show_bug.cgi?id=5309 This is the subject of bug 14230, "Add a button to request a new fancy captcha (code)": https://bugzilla.wikimedia.org/show_bug.cgi?id=14230 Generally it's a good idea to search Bugzilla before mailing this list. More often than not, Bugzilla will contain the relevant problem and a discussion of it.
This comes across as dismissive. Saying "we have old bugs filed that no-one is working on" is not a reason to dismiss discussion of a real problem. Tim has noted how badly our captcha solutions suck.
(It's a real pity reCaptcha is third-party and proprietary.)
- d.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Saturday 05 February 2011 10:51 PM, Diederik van Liere wrote:
Isn't this a nice gsoc 2011 project? Best Diederik
Sent from my iPhone
On 2011-02-05, at 2:40, David Gerarddgerard@gmail.com wrote:
There is atleast one successful captcha php script for Malayalam Language (http://sourceforge.net/projects/mlcaptcha/ , http://mlcaptcha.blogspot.com/2010/02/blog-post_24.html ). I don't know whether it can work with mediawiki.
On 5 February 2011 05:19, MZMcBridez@mzmcbride.com wrote:
This is the subject of bug 5309, "Localize captcha images": https://bugzilla.wikimedia.org/show_bug.cgi?id=5309 This is the subject of bug 14230, "Add a button to request a new fancy captcha (code)": https://bugzilla.wikimedia.org/show_bug.cgi?id=14230 Generally it's a good idea to search Bugzilla before mailing this list. More often than not, Bugzilla will contain the relevant problem and a discussion of it.
This comes across as dismissive. Saying "we have old bugs filed that no-one is working on" is not a reason to dismiss discussion of a real problem. Tim has noted how badly our captcha solutions suck.
(It's a real pity reCaptcha is third-party and proprietary.)
- d.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
praveenp wrote:
There is atleast one successful captcha php script for Malayalam Language (http://sourceforge.net/projects/mlcaptcha/ , http://mlcaptcha.blogspot.com/2010/02/blog-post_24.html ). I don't know whether it can work with mediawiki.
It could be added, although I find that particular captcha easier for bots than for humans. And an attacker can easily play with the parameters to weaken it even more.
On 05/02/11 22:36, Platonides wrote:
praveenp wrote:
There is atleast one successful captcha php script for Malayalam Language (http://sourceforge.net/projects/mlcaptcha/ , http://mlcaptcha.blogspot.com/2010/02/blog-post_24.html ). I don't know whether it can work with mediawiki.
It could be added, although I find that particular captcha easier for bots than for humans. And an attacker can easily play with the parameters to weaken it even more.
Alternatively, you could add a Malayalam wordlist to the current Wikipedia captcha, which would have the same effect, and has (I hope) a better visual obfuscation method than this, designed specifically to resist some of the most recent bot decoding methods.
Perhaps we could have a place to add these wordlists on the meta-wiki or on translatewiki, to allow people without transmit rights to create them? All that is needed is about 2,000 short words for each language, which can be used to create around 4,000,000 possible challenge words, which will in turn will be used to create an endless stream of captcha images, no two of which should ever be alike.
The wordlists themselves need not be secret: they are only needed to create easily-typed strings that are sufficiently large in number to provide a moderate challenge to brute force guessing.
-- Neil
Probably not. All you need is a word list for each language. As long as you have a font that supports the characters, there aren't any major issues that would prevent you from generating non-English captchas. Bug 5309 points out a couple minor issues with the captcha script. You may have to generate word lists for some languages, but I doubt that would take all summer. It's just a matter of someone sitting down and doing it.
David Gerard wrote:
On 5 February 2011 05:19, MZMcBride z@mzmcbride.com wrote:
This is the subject of bug 5309, "Localize captcha images": https://bugzilla.wikimedia.org/show_bug.cgi?id=5309 This is the subject of bug 14230, "Add a button to request a new fancy captcha (code)": https://bugzilla.wikimedia.org/show_bug.cgi?id=14230 Generally it's a good idea to search Bugzilla before mailing this list. More often than not, Bugzilla will contain the relevant problem and a discussion of it.
This comes across as dismissive. Saying "we have old bugs filed that no-one is working on" is not a reason to dismiss discussion of a real problem. Tim has noted how badly our captcha solutions suck.
My intention wasn't to come across as dismissive. On the other hand, if people begin new conversations without having read the old conversations, it sets back progress dramatically. The opening post didn't make any mention of the old bugs or their progress, so I was trying to point out that these issues were already known and there were already forums in which they could and should be discussed.
David Gerard (also) wrote:
(It's a real pity reCaptcha is third-party and proprietary.)
I think it's a real pity that CAPTCHAs are needed at all. They're a pain-in-the-ass and their effectiveness against coordinated or sophisticated attacks is dubious at best.
Diederik van Liere wrote:
Isn't this a nice gsoc 2011 project?
Ideas for Google Summer of Code 2011 should go here: http://www.mediawiki.org/wiki/Summer_of_Code_2011/Project_ideas
Platonides wrote:
I agree. Some captchas are quite bad.
Occasionally the CAPTCHA will create offensive combinations, which are a bit worse than double Es and the like. ;-) This is the subject of bugs 10408, 16166, and 21025.
Alex wrote (referring to a GSOC project involving better CAPTCHA support):
Probably not. All you need is a word list for each language. As long as you have a font that supports the characters, there aren't any major issues that would prevent you from generating non-English captchas. Bug 5309 points out a couple minor issues with the captcha script. You may have to generate word lists for some languages, but I doubt that would take all summer. It's just a matter of someone sitting down and doing it.
There's certainly a right balance to be struck between projects that are far too large and complex (and thus never get finished) and projects that are too small and get finished within a day or two. Personally, I'd much rather have a bunch of small projects get finished (and re-worked as necessary) than have one large project get started, but never finished (LiquidThreads, interwiki transclusion, etc.).
If you looked at all of the CAPTCHA-related bugs as a group (including possibly removing the Python dependency), there's more than enough to be at least considered for Summer of Code 2011.
MZMcBride
MZMcBride wrote:
My intention wasn't to come across as dismissive. On the other hand, if people begin new conversations without having read the old conversations, it sets back progress dramatically. The opening post didn't make any mention of the old bugs or their progress, so I was trying to point out that these issues were already known and there were already forums in which they could and should be discussed.
There are such misunderstandings, so is sometimes the internet.
I think it's a real pity that CAPTCHAs are needed at all. They're a pain-in-the-ass and their effectiveness against coordinated or sophisticated attacks is dubious at best.
And then you have projects like ptwiki which permanently make IPs pass captchas due to a bot attack which was being done three years ago [1]. After running this way for three years, it probably needs community consensus to change now the status quo.
1- http://pt.wikipedia.org/wiki/Wikip%C3%A9dia:Esplanada/Arquivo/2008/Janeiro#A...
If you looked at all of the CAPTCHA-related bugs as a group (including possibly removing the Python dependency), there's more than enough to be at least considered for Summer of Code 2011.
We should create a captcha tracking bug.
2011/2/5 MZMcBride z@mzmcbride.com:
While I sympathize with non-English speakers, I must confess that it's been quite a while since I filled out a CAPTCHA on a Wikimedia wiki. Surely most users use unified login, requiring a CAPTCHA to only be filled out once for all Wikimedia wikis?
I was thinking that this first step into entering a wiki (as a registered user) might be the most difficult thing people will ever be required to perform in their life as a Wikimedia user. It is like we require our users to have an IQ above 130, while very simple on-wiki tasks such as correcting typing mistakes don't require more than an average IQ.
Being technically able to type the local script somehow is a prerequisite for participation in the wiki. Therefore it should be okay to have the captcha in local script. It won't impede those users familiar with the wiki's local language. But it will potentially impede foreign users. Therefore it would be useful to provide a drop-down menu that allows you to choose the script of the captcha. That way every user can choose the script that fits best.
Instead of captchas like "shipsneeds" we of course need words in the local language. It shouldn't be hard to do some statistical analysis of existing articles on the wiki and to collect a sample of common words of limited length that can be combined to form local captchas. (I guess the above-mentioned script drop-down should be a script/language combination drop-down then.)
Marcus Buck User:Slomox
2011/2/5 Marcus Buck wiki@marcusbuck.org
Instead of captchas like "shipsneeds" we of course need words in the local language. It shouldn't be hard to do some statistical analysis of existing articles on the wiki and to collect a sample of common words of limited length that can be combined to form local captchas. (I guess the above-mentioned script drop-down should be a script/language combination drop-down then.)
Just to let you know that Aubrey just prestented it.source idea for wikicaptcha into wikisource-l
:-)
Obviously, if a wikicaptcha tool will be built and will run, we can do anything .... and while interpreting words (in any language) any user will contribute to source transcriptions in a very valuable way.
Alex brollo
On 5 February 2011 15:12, Alex Brollo alex.brollo@gmail.com wrote:
Just to let you know that Aubrey just prestented it.source idea for wikicaptcha into wikisource-l :-)
This is excellent!
What would it take to get this into place? What's the captcha load on WMF sites? Would e.g. the toolserver melt under the load? Perhaps on one project at a time?
- d.
2011/2/5 David Gerard dgerard@gmail.com
This is excellent!
What would it take to get this into place? What's the captcha load on WMF sites? Would e.g. the toolserver melt under the load? Perhaps on one project at a time?
Please consider that only a test script run - just to show that it's possible that a python script loads djvu text layer, selects doubtful words, selects the image of such words and saves them into a file. Now it's matter for excellent developers: how to select djvu files, where to upload them, how to build the database os words/images, how to build a user interface to show images and to get user input.... our it.source test documents that the first step can be done. It's so rewarding to give to that script the numbero of a djvu page, nothing re, then to see tiff images popping out into the folder... :-)
Alex
In article AANLkTikWLU5Y8C2UokYRN=v1-zwhb1ktHNXi4xtbmXja@mail.gmail.com, David Gerard dgerard@gmail.com wrote:
On 5 February 2011 15:12, Alex Brollo alex.brollo@gmail.com wrote:
Just to let you know that Aubrey just prestented it.source idea for wikicaptcha into wikisource-l
What would it take to get this into place? What's the captcha load on WMF sites? Would e.g. the toolserver melt under the load? Perhaps on one project at a time?
I don't think this should be hosted on the Toolserver; as CAPTCHAs are a core part of the site, they should not rely on the TS to work.
- river.
2011/2/5 River Tarnell r.tarnell@ieee.org
In article AANLkTikWLU5Y8C2UokYRN=v1-zwhb1ktHNXi4xtbmXja@mail.gmail.com, David Gerard dgerard@gmail.com wrote:
On 5 February 2011 15:12, Alex Brollo alex.brollo@gmail.com wrote:
Just to let you know that Aubrey just prestented it.source idea for wikicaptcha into wikisource-l
What would it take to get this into place? What's the captcha load on WMF sites? Would e.g. the toolserver melt under the load? Perhaps on one project at a time?
I don't think this should be hosted on the Toolserver; as CAPTCHAs are a core part of the site, they should not rely on the TS to work.
- river.
IMHO, it could be an opportunity to think again to the role of Commons as a central library. I imagine something like this:
1. as soon as a djvu file with a text layer is uploaded, a complete set of pages text layers is extracted, saving words coordinates too; 2. such text layers could be browsed by a script, extracting all words marked as doubtful (usually with a ^ characters), but extracting too words which don't match with a good dictionary; 3. a dynamic recaptcha database is updated and word images are submitted to wiki contributors, both as a formal captcha for unlogged user edits, and as a volunteer job to help wikisource projects; updates will fix text files; 4. a tool should be build, to upload "pure text" from such text files into any wikisource project; 5. finally refined text could be re-uploaded into djvu file, so converting it into a "djvu file with a wiki text layer".
Alex
4.
Teofilo escribió:
I've just opened a new account on the Danish Wikipedia.
I was asked to read the following Captcha :
http://da.wikipedia.org/w/index.php?title=Speciel:Captcha/image&wpCaptch...
Captcha urls can only be viewed once, you'd need to uplooad it somewhere for sharing.
I provided the following answer : "shipsneeds" and it was accepted. But I must confess that reading the second "e" and the "d" was difficult for me.
I was anguished, because I feared that if I answered a wrong answer, I would not be given a second chance for some time.
That's not the case.
I don't see why both "e"s should have a different look. Because they looked different, I was puzzled during a long time before eventually choosing to type a second "e". The vertical bar of the "d" being extremely short in comparison with that of the "h" and that of the "p", I was wondering if the last character could not be a square looking "o" or a manual script "a".
I agree. Some captchas are quite bad.
I had all these difficulties although I am among the advanced readers and speakers of English. I Know that "ee" is an often found character combination in English. I could also recognize the words "ships" and "needs".
What about non-English speakers ?
Should we not have Japanese-based, Malayalam-based (there is a lot of talk nowadays in having Wikipedia growing in India), etc. captchas ?
Note it'd be trivial to generate a set of captchas in a different language. You'd just need the appropiate (secret) dictionary.
For the time being, while the captcha is English-based, how about adding a button with "let me try another captcha" for people experiencing a captcha that is very difficult to read ?
Well, it should be done (bug 14230) :)
wikitech-l@lists.wikimedia.org