[WikiEN-l] Captcha word quiz
Ilmari Karonen
nospam at vyznev.net
Wed Mar 22 13:17:08 UTC 2006
James D. Forrester wrote:
> Steve Bennett wrote:
>>On 3/21/06, geni <geniice at gmail.com> wrote:
>
>>>Initialy you could just use plain speach and rely on the issue that
>>>no one else uses them so no one is going to bother makeing a bot.
>>>More advanced aproaches could involve spoting male and female voices
>>>against background noise.
>
> That could work, though automated (unattended) text-to-speech synthesis
> has not significantly improved in my lifetime. (Read: it's terrible.
> Yes, even IBM's whizzo-prang stuff. Horrendously disjointed and
> considerably more difficult to use as a "test" of something's humanity
> than would be desired, I feel.)
Yes, but the nice thing is that we don't really need speech synthesis
for this, since recordings of spoken words are plentiful and easily
available. In fact, we already have a perfect source: the Spoken
Wikipedia project.
Cutting the recordings up to build a decent library of a few thousand
words (with a dozen or more recordings of each) does take some work, but
could be semiautomated (given that we have access to the written
version) and easily done by volunteer contributors.
Background babble can be generated automatically by mixing random
segments from the same recordings. Other noises can also be included:
anyone with a laptop and a microphone can easily contribute practically
limitless amounts of digitally recorded noise.
--
Ilmari Karonen
More information about the WikiEN-l
mailing list