On 10/7/07, Simetrical Simetrical+wikilist@gmail.com wrote:
On 10/7/07, Platonides Platonides@gmail.com wrote:
It has been discussed here before about the captchas which are too hard to pass. However, without samples. Today i found one of these captchas. I read ghooktrust but mediawiki didn't agree. The first letter could be a 5, but we don't use numbers. So i now finally noticed it might be an s
Well, the captcha always consists of two words concatenated together, I do believe. "Shook" is a rather obscure word, however. Perhaps the dictionary could be made less comprehensive. Although that brings us back to non-English speakers, who won't be helped at all.
It could be either, yes, looking at it. But if you refresh it gives you a different captcha, right?
We should change to random characters: Using dictionary words, even a 'secret' dictionary, substantially reduces the entropy of the captchas. Yes, the dictionary makes the captcha easier for humans but it's an even bigger help to computers which can fit much more accurate state transition models in their memory.
The goal of the captcha should be to maximize the gap between humans and computers, the goal should not be to be maximally hard.
Right now our captcha is weak by standard wisdom: the characters are too easily segmented. A tuned copy of the tesseract 2.0 OCR without any statistical modeling can recognize about 25% the letters in most of the Wikimedia captchas. Thats still pretty far from cracking it, but I bet someone skilled at captcha cracking wouldn't have too hard a time.
The captcha generator is a really simple python script that is easy and fun to modify. I made a copy here that distorts the text less but packs the characters closer together and adds a wiggly connecting line which is popular these days. The result is easier to read, making the use of mostly random characters acceptable and it completely defeats tessearct ... but I can't prove that it's not massively less secure against some other attack so I haven't proposed that we use it. :(