Dear all,
I hope your season's holidays are going well. I'm writing to inform you that I have released on Github a first (0.1) version of wikicapthca[1] a ReCAPTCHA-like program for Wiki*. The thing is born from an initial observation by Alex brollo[2] about which we have discussed much both in WMI mailing list and also on Wikisource-l[3]. For starters the code there is a rewriting of Alex's scripts and nothing more, and here's what the program does for now: 1) gets a djvu, 2) extracts the text layer, 3) identifies non recognized words and 4) produce a tiff image of them. But I would like to write a "proof of concept" of the whole process from getting OCR-ed djvu's from Commons to producing challenges, serving them, collecting answer and then using those answers in some useful way. That's said, you see there's still a long way to go.
Obviously in the long run we could use this system as a backup for the current one, which has demonstrated some limitations[4], but I'm sure there are many aspects of the problem which go beyond my knowledge (I'm a physicist not a computer scientist, you know) so any help and/or advice is welcome.
Thanks for your time.
Cristian
[1]https://github.com/CristianCantoro/wikicaptcha [2]http://it.wikisource.org/wiki/Utente:Alex_brollo [3]http://lists.wikimedia.org/pipermail/wikisource-l/2011-February/000939.html [4]http://lists.wikimedia.org/pipermail/wikitech-l/2011-November/056078.html
Hello Cristian, Kudos to Alex and you for starting the ball rolling on this project. I recommend you to get an account in our repository [1].
Integrating this into ConfirmEdit extension shouldn't be hard. It's the extra features what makes this tricky. This system is interesting for gathering translations, but doesn't work for verifying that the answer is right. How would you verify that? The approach that comes to my mkind is to show both the current captcha plus another, optional, captcha, with a note about how filling that second captcha helps wikisource, and that the answer will be logged with their username/ip. Then dump all answers in a log restricted to captcha-reviewers.
1- http://www.mediawiki.org/wiki/Commit_access#Requesting_commit_access
wikitech-l@lists.wikimedia.org