[WikiEN-l] Captcha word quiz

Ilmari Karonen nospam at vyznev.net
Wed Mar 22 13:17:08 UTC 2006


James D. Forrester wrote:
> Steve Bennett wrote:
>>On 3/21/06, geni <geniice at gmail.com> wrote:
> 
>>>Initialy you could just use plain speach and rely on the issue that
>>>no one else uses them so no one is going to bother makeing a bot.
>>>More advanced aproaches could involve spoting male and female voices
>>>against background noise.
> 
> That could work, though automated (unattended) text-to-speech synthesis
> has not significantly improved in my lifetime. (Read: it's terrible.
> Yes, even IBM's whizzo-prang stuff. Horrendously disjointed and
> considerably more difficult to use as a "test" of something's humanity
> than would be desired, I feel.)

Yes, but the nice thing is that we don't really need speech synthesis 
for this, since recordings of spoken words are plentiful and easily 
available.  In fact, we already have a perfect source: the Spoken 
Wikipedia project.

Cutting the recordings up to build a decent library of a few thousand 
words (with a dozen or more recordings of each) does take some work, but 
could be semiautomated (given that we have access to the written 
version) and easily done by volunteer contributors.

Background babble can be generated automatically by mixing random 
segments from the same recordings.  Other noises can also be included: 
anyone with a laptop and a microphone can easily contribute practically 
limitless amounts of digitally recorded noise.

-- 
Ilmari Karonen



More information about the WikiEN-l mailing list