On 18 Mar 2006 at 14:42, Philip Welch wikipedia@philwelch.net wrote:
You have images turned off in your browser? Are you using Netscape 4.7 or something?
There are many important features waiting for our developers to implement. Supporting 1990's era web browsing is not one of them. I suggest you use a modern browser to browse Wikimedia sites in the future.
Wikipedia is the last site I'd expect to be telling people "Get a better browser, loser!" What are you going to do next, add user- agent sniffing to turn people away if they're not using the right browser, resolution, or operating system?
Anyway, Firefox and the Mozilla/SeaMonkey Suite, to name a couple of fully modern browsers, offer the configuration option to disable images.
On 3/19/06, Daniel R. Tobias dan@tobias.name wrote:
Anyway, Firefox and the Mozilla/SeaMonkey Suite, to name a couple of fully modern browsers, offer the configuration option to disable images.
You seem to be arguing something strange here. It sounds like you're saying that the tools are available if people try hard enough, to get themselves temporarily into a situation where they are unable to answer a CAPTCHA, which they only need to do once.
If you're arguing something else, I apologise.
Steve
On Mar 19, 2006, at 12:44 PM, Daniel R. Tobias wrote:
On 18 Mar 2006 at 14:42, Philip Welch wikipedia@philwelch.net wrote:
You have images turned off in your browser? Are you using Netscape 4.7 or something?
There are many important features waiting for our developers to implement. Supporting 1990's era web browsing is not one of them. I suggest you use a modern browser to browse Wikimedia sites in the future.
Wikipedia is the last site I'd expect to be telling people "Get a better browser, loser!"
That's because we don't use AJAX or Flash or anything extravagant like that, and if we did, we'd have other ways to use the site. We ask so little.
Anyway, Firefox and the Mozilla/SeaMonkey Suite, to name a couple of fully modern browsers, offer the configuration option to disable images.
Do they offer the option to turn them back on again? Even selectively, so you can right-click on an image and choose to load it? If so, captchas should be no problem. In fact, captchas should be no problem even in Netscape 4.7 because as I recall, Netscape always had that feature.
On 3/20/06, Philip Welch wikipedia@philwelch.net wrote:
Do they offer the option to turn them back on again? Even selectively, so you can right-click on an image and choose to load it? If so, captchas should be no problem. In fact, captchas should be no problem even in Netscape 4.7 because as I recall, Netscape always had that feature.
Pretty sure Netscape 3 had it too. Those were the days. :)
Steve
Do we have to have catchpa's at all?
No one seems to be addressing the original poster's comment. Why not have a simple IQ test type question? Something that even the most idiotic human could answer but a machine could not?
Theresa
On 3/20/06, Theresa Knott theresaknott@gmail.com wrote:
Do we have to have catchpa's at all?
No one seems to be addressing the original poster's comment. Why not have a simple IQ test type question? Something that even the most idiotic human could answer but a machine could not?
Theresa
You've just described a catchpa.
-- geni
On 3/20/06, geni geniice@gmail.com wrote:
On 3/20/06, Theresa Knott theresaknott@gmail.com wrote:
Do we have to have catchpa's at all?
No one seems to be addressing the original poster's comment. Why not have a simple IQ test type question? Something that even the most idiotic human could answer but a machine could not?
Theresa
You've just described a catchpa.
Sorry i thought a catchpa was a wiggly word _image_. What I am descirbing could easily be text.
Theresa
On 3/20/06, Theresa Knott theresaknott@gmail.com wrote:
Sorry i thought a catchpa was a wiggly word _image_. What I am descirbing could easily be text.
Invent one then ;) Bear in mind that if it's multiple choice, then the robot could just have a few goes.
Actually it would be nice if someone would explain why wikibooks (if I understood correctly) now needs captchas.
Steve
On 3/20/06, Steve Bennett stevage@gmail.com wrote:
On 3/20/06, Theresa Knott theresaknott@gmail.com wrote:
Sorry i thought a catchpa was a wiggly word _image_. What I am descirbing could easily be text.
Invent one then ;) Bear in mind that if it's multiple choice, then the robot could just have a few goes.
A colour that's the opposite of black
the number of days in a week
A pet animal that goes woof woof.
Actually it would be nice if someone would explain why wikibooks (if I understood correctly) now needs captchas.
Linkspam.
Theresa
On 3/20/06, Theresa Knott theresaknott@gmail.com wrote:
On 3/20/06, Steve Bennett stevage@gmail.com wrote:
On 3/20/06, Theresa Knott theresaknott@gmail.com wrote:
Sorry i thought a catchpa was a wiggly word _image_. What I am descirbing could easily be text.
Invent one then ;) Bear in mind that if it's multiple choice, then the robot could just have a few goes.
A colour that's the opposite of black
the number of days in a week
A pet animal that goes woof woof.
That has the problem of being anglocentric.
You are going to need at least 1000 of these and you are going to need them in a lot of languages.
-- geni
Speaking of anglocentrism, I wonder if it'd be possible for the current captcha software to generate captchas in other languages. I've seen some generated captchas at the Vietnamese Wikipedia that would definitely confuse Vietnamese-speakers (can't remember the words exactly), because of things like r's and n's smooshed up right next to each other and stuff. The user might have to /guess/ because the English words really don't follow Vietnamese spelling rules.
An advantage to localizing the captchas would be that it might reduce the impact of spambots at non-English projects. As far as I know, there isn't yet a captcha-defeating bot that understands Vietnamese or Basque or Quechua.
By the way, I'm only proposing localizing for most languages that use the Latin alphabet, because requiring users to respond to a captcha in Thai or Arabic would exclude a lot of legitimate interwiki users. And users of other scripts tend to have the means of entering in Latin-based characters. Also, for languages that use diacritical marks, we could generate the words without the marks and modify [[MediaWiki:Captcha-createaccount]], asking the user to enter in the word without diacritical marks of any kind.
geni wrote:
On 3/20/06, Theresa Knott theresaknott@gmail.com wrote:
On 3/20/06, Steve Bennett stevage@gmail.com wrote:
On 3/20/06, Theresa Knott theresaknott@gmail.com wrote:
Sorry i thought a catchpa was a wiggly word _image_. What I am descirbing could easily be text.
Invent one then ;) Bear in mind that if it's multiple choice, then the robot could just have a few goes.
A colour that's the opposite of black
the number of days in a week
A pet animal that goes woof woof.
That has the problem of being anglocentric.
You are going to need at least 1000 of these and you are going to need them in a lot of languages.
-- geni
Minh Nguyen wrote:
Speaking of anglocentrism, I wonder if it'd be possible for the current captcha software to generate captchas in other languages. I've seen some generated captchas at the Vietnamese Wikipedia that would definitely confuse Vietnamese-speakers (can't remember the words exactly), because of things like r's and n's smooshed up right next to each other and stuff. The user might have to /guess/ because the English words really don't follow Vietnamese spelling rules.
An advantage to localizing the captchas would be that it might reduce the impact of spambots at non-English projects. As far as I know, there isn't yet a captcha-defeating bot that understands Vietnamese or Basque or Quechua.
By the way, I'm only proposing localizing for most languages that use the Latin alphabet, because requiring users to respond to a captcha in Thai or Arabic would exclude a lot of legitimate interwiki users. And users of other scripts tend to have the means of entering in Latin-based characters. Also, for languages that use diacritical marks, we could generate the words without the marks and modify [[MediaWiki:Captcha-createaccount]], asking the user to enter in the word without diacritical marks of any kind.
Indeed it would be possible: what would be needed would be a word list of about a thousand short words, for each language that needed its own captchas, since the captcha software uses these to build its challenge strings. It _might_ be possible to start with a set of common words in English, and to use Wiktionary to choose the nearest equivalents in each language.
However, I also think that it would be a good idea to have captchas in non-Latin scripts as well: presumably many Arabic or Thai readers have the same problems recognizing Latin characters that readers of Latin scripts would have with Arabic or Thai characters. We could always offer a Latin-script alternative as a fallback.
-- Neil
Neil Harris wrote:
Indeed it would be possible: what would be needed would be a word list of about a thousand short words, for each language that needed its own captchas, since the captcha software uses these to build its challenge strings. It _might_ be possible to start with a set of common words in English, and to use Wiktionary to choose the nearest equivalents in each language.
However, I also think that it would be a good idea to have captchas in non-Latin scripts as well: presumably many Arabic or Thai readers have the same problems recognizing Latin characters that readers of Latin scripts would have with Arabic or Thai characters. We could always offer a Latin-script alternative as a fallback.
-- Neil
I've filed Bug 5309 http://bugzilla.wikimedia.org/5309 on this.
Theresa Knott wrote:
On 3/20/06, Steve Bennett stevage@gmail.com wrote:
On 3/20/06, Theresa Knott theresaknott@gmail.com wrote:
Sorry i thought a catchpa was a wiggly word _image_. What I am descirbing could easily be text.
Invent one then ;) Bear in mind that if it's multiple choice, then the robot could just have a few goes.
A colour that's the opposite of black
the number of days in a week
A pet animal that goes woof woof.
Unfortunately, Googling and word-counting is a rather powerful way of cheating at these sorts of simple general-knowledge questions:
Searching for "animal that goes woof woof". removing trivial words from the page fragments Google returns on its search page, and then counting the most common remaining words gives the following:
43: woof 9: animal 8: dog 7: goes 6: joke
for "colour that's the opposite of black" we get:
21: colour 10: black 6: opposite 6: white 5: you
If you then try the words which are not in the question, the highest rated few words tend to contain the answer to the question. Even if we try at random from the top-rated words, there's a good chance of success, which is all a bot needs.
-- Neil
On 3/20/06, Neil Harris neil@tonal.clara.co.uk wrote:
Theresa Knott wrote:
On 3/20/06, Steve Bennett stevage@gmail.com wrote:
On 3/20/06, Theresa Knott theresaknott@gmail.com wrote:
Sorry i thought a catchpa was a wiggly word _image_. What I am descirbing could easily be text.
Invent one then ;) Bear in mind that if it's multiple choice, then the robot could just have a few goes.
A colour that's the opposite of black
the number of days in a week
A pet animal that goes woof woof.
Unfortunately, Googling and word-counting is a rather powerful way of cheating at these sorts of simple general-knowledge questions:
Searching for "animal that goes woof woof". removing trivial words from the page fragments Google returns on its search page, and then counting the most common remaining words gives the following:
43: woof 9: animal 8: dog 7: goes 6: joke
for "colour that's the opposite of black" we get:
21: colour 10: black 6: opposite 6: white 5: you
If you then try the words which are not in the question, the highest rated few words tend to contain the answer to the question. Even if we try at random from the top-rated words, there's a good chance of success, which is all a bot needs.
OK so simple quiz questions are out, and so are multiple choice questions. What about this:
Take the second letter of ADAM, the first letter of OGRE and the last of RAINING.
Is it possible for a bot to get around that?
Theresa
Theresa Knott wrote:
Take the second letter of ADAM, the first letter of OGRE and the last of RAINING.
Is it possible for a bot to get around that?
In a word, yes. Unless you want to try coming up with new and interesting variations faster than the bots can be taught to parse them.
On 3/20/06, Ilmari Karonen nospam@vyznev.net wrote:
Theresa Knott wrote:
Take the second letter of ADAM, the first letter of OGRE and the last of RAINING.
Is it possible for a bot to get around that?
In a word, yes. Unless you want to try coming up with new and interesting variations faster than the bots can be taught to parse them.
And I bet if you give that captcha to enough humans, enough will fail for things to get very tense very quickly. "What do you mean take the second letter? What do I do with it? I just want to help with wikibooks!!!"
Mind you, graphical captchas can get beaten too, unless they're done very carefully.
Perhaps if linkspam is the problem, we should just implement severe restrictions on including URLs? It's hard to imagine many situations where a new editor with almost no editors is adding a really useful URL that complies with [[WP:External links]]. And is there a reason Wikipedia doesn't use Google's suggested "nofollow" tag?
Steve
Steve Bennett wrote:
On 3/20/06, Ilmari Karonen nospam@vyznev.net wrote:
Theresa Knott wrote:
Take the second letter of ADAM, the first letter of OGRE and the last of RAINING.
Is it possible for a bot to get around that?
In a word, yes. Unless you want to try coming up with new and interesting variations faster than the bots can be taught to parse them.
And I bet if you give that captcha to enough humans, enough will fail for things to get very tense very quickly. "What do you mean take the second letter? What do I do with it? I just want to help with wikibooks!!!"
Mind you, graphical captchas can get beaten too, unless they're done very carefully.
Perhaps if linkspam is the problem, we should just implement severe restrictions on including URLs? It's hard to imagine many situations where a new editor with almost no editors is adding a really useful URL that complies with [[WP:External links]]. And is there a reason Wikipedia doesn't use Google's suggested "nofollow" tag?
Steve
We already have a spam filter. And the nofollow proposal was rejected by the community.
John
On 3/20/06, John Lee johnleemk@gawab.com wrote:
We already have a spam filter. And the nofollow proposal was rejected by the community.
For en.wikipedia only, I thought?
-Matt
Matt Brown wrote:
On 3/20/06, John Lee johnleemk@gawab.com wrote:
We already have a spam filter. And the nofollow proposal was rejected by the community.
For en.wikipedia only, I thought?
-Matt
Well, this *is* WikiEn-l... (Shouldn't the Wikibookians have their own list?)
John
On 3/20/06, John Lee johnleemk@gawab.com wrote:
Matt Brown wrote:
On 3/20/06, John Lee johnleemk@gawab.com wrote:
We already have a spam filter. And the nofollow proposal was rejected by the community.
So I see: http://en.wikipedia.org/wiki/Wikipedia:Nofollow
61% voted for not using nofollow, 39% voted for using it, in Feb 2005.
Steve
ASCII paswwords?
_______ ________________ | /\ | | / \ | | /___\ | | / \ | |________ / \ |
Theresa
Theresa Knott wrote:
ASCII paswwords?
| /\ | | / \ | | /___\ | | / \ | |________ / \ |
Theresa
Alas, they won't work for the blind either.
Oh, and there's one more problem with textual captchas: you will also need them to be available in every language, which implies a non-trivial translation effort.
At least the current visual captchas can be used by anyone familiar with the Latin alphabet (although they're currently harder for non-English speakers).
I still think the best idea is a web form where the user is required to make their request for access in natural language using complete sentences, possibly prompted by a set of simple questions. Real requests should be easy for human beings to distinguish from chatbot-like computer-generated requests, unless they are generated using canned human-generated sentence structures, which will be easy to filter out using a Bayesian scheme. Engaging in an arms race against this will of course be possible, but will require significant amounts of human effort from the spammers, which will slow them down.
Sample questions might be: "Please tell us why you want to edit."
"Please tell us what you think of Wikipedia."
"How did you find this website?"
-- Neil
Can everyone just stop a second from the semi-masturbatory tech-talk about the ultimate CAPTCHA that no bot will get around and that all humans will be able to recognize, without using any browser never than 1992. Please :P
We can all agree that the VAST majority of users will be able to view an image CAPTCHA, and solve it. Like, 99.5% atleast. So the question is, what do we do with the 0.5%? I want to reiterate my earlier point, why not just provide an email address, either to something like OTRS or the helpdesk, or to a special email solely dedicated for this purpose, where people will be able to set up an account for them?
Doesn't this sound alot more sensible than wasting developer time developing an ultimate text-string that will, at best, make us look silly. Also, it would take up huge amounts of contributor time just figuring out the "questions" or whatever.
"If you have problem seeing this image, or signing up in general, please contact someone@wikimedia.org with your account information, and we will assist you in creating an account"
That's all it takes. That's IT!
--Oskar
PS. Sorry if I sounded slightly rantish, but jeez, the suggestions you people have......
On 3/20/06, Oskar Sigvardsson oskarsigvardsson@gmail.com wrote:
"If you have problem seeing this image, or signing up in general, please contact someone@wikimedia.org with your account information, and we will assist you in creating an account"
That's pretty much the current text. Except no one has gotten around to actually adding the email address. Which is what started this whole thread.
Steve
So can someone set it up and add it?
On 3/20/06, Steve Bennett stevage@gmail.com wrote:
On 3/20/06, Oskar Sigvardsson oskarsigvardsson@gmail.com wrote:
"If you have problem seeing this image, or signing up in general, please contact someone@wikimedia.org with your account information, and we will assist you in creating an account"
That's pretty much the current text. Except no one has gotten around to actually adding the email address. Which is what started this whole thread.
Steve _______________________________________________ WikiEN-l mailing list WikiEN-l@Wikipedia.org To unsubscribe from this mailing list, visit: http://mail.wikipedia.org/mailman/listinfo/wikien-l
I'd suggest getting in touch with an admin at Wikibooks and asking them to change "contact the site administrators" to "[[Wikimedia:Contact us|contact the Wikimedia Foundation]]". Or something like that.
Oskar Sigvardsson wrote:
So can someone set it up and add it?
On 3/20/06, Steve Bennett stevage@gmail.com wrote:
On 3/20/06, Oskar Sigvardsson oskarsigvardsson@gmail.com wrote:
"If you have problem seeing this image, or signing up in general, please contact someone@wikimedia.org with your account information, and we will assist you in creating an account"
That's pretty much the current text. Except no one has gotten around to actually adding the email address. Which is what started this whole thread.
Steve
Theresa Knott wrote:
On 3/20/06, Neil Harris neil@tonal.clara.co.uk wrote:
Theresa Knott wrote:
On 3/20/06, Steve Bennett stevage@gmail.com wrote:
On 3/20/06, Theresa Knott theresaknott@gmail.com wrote:
Sorry i thought a catchpa was a wiggly word _image_. What I am descirbing could easily be text.
Invent one then ;) Bear in mind that if it's multiple choice, then the robot could just have a few goes.
A colour that's the opposite of black
the number of days in a week
A pet animal that goes woof woof.
Unfortunately, Googling and word-counting is a rather powerful way of cheating at these sorts of simple general-knowledge questions:
Searching for "animal that goes woof woof". removing trivial words from the page fragments Google returns on its search page, and then counting the most common remaining words gives the following:
43: woof 9: animal 8: dog 7: goes 6: joke
for "colour that's the opposite of black" we get:
21: colour 10: black 6: opposite 6: white 5: you
If you then try the words which are not in the question, the highest rated few words tend to contain the answer to the question. Even if we try at random from the top-rated words, there's a good chance of success, which is all a bot needs.
OK so simple quiz questions are out, and so are multiple choice questions. What about this:
Take the second letter of ADAM, the first letter of OGRE and the last of RAINING.
Is it possible for a bot to get around that?
Theresa
On its own, probably no. If you start using thousands of questions of the same form, though, someone will write a simple program to do them, and it then becomes trivial.
I tried to think of a good text-only captcha scheme some time ago, but came up short.
The ideal text captcha is: * endlessly variable (there must be at least millions of potential challenges, to defend against replay attacks) * easy for people to answer without any specialist knowledge * easy to answer for people without advanced skills in the target language * not generated by a simple algorithm which can be reverse-engineered (as with the above) * not Googlable * easy to assess the answer using a computer program (which typically means it's a simple word or phrase)
It's hard to generate large numbers of questions which are easy for people to answer, but hard for machines. For a start, questions about obscure topics are a test of general knowledge, not humanity and many people will fail them. Questions with contorted syntax will be difficult for non-native speakers.
Questions of the form "what is the capital of X" are easily dealt with by simple lookup, or Googling. In general, any database of facts you can find in public to pose questions can also be found by a spammer.
Algorithmically-generated questions are vulnerable to reverse engineering: questions which require the reader to perform simple symbol manipulations or answer auto-generated logic puzzles are easily performed by computer. Devious riddles like the Riddle of the Sphinx will stump most readers if they do not already know the answer, and all the common ones are Googlable anyway. Questions with ambiguous answers are hard to mark correctly using a computer.
Even assuming a carefully compiled list of, say, 1000 suitable questions that avoid all these pitfalls could be composed by hand (perhaps by a group effort), a spammer would only need to build a list of them once, and they could then be answered perfectly by simple lookup.
The nice things about visual captchas is that the operations used to create them are effectively one-way: for example, stirring the pixels in the current Wikipedia captchas is easy to do, but hard to invert programmatically, yet the human eye can still decode it. What's needed is a similar operation for text that uses the power of the human cognitive system in the same way that visual captchas use the power of the human visual system.
For example, one good class of anti-bot precautions uses Javascript, and works on the principle that most bot authors cannot be bothered to include a Javascript interpreter in their bot, but that every modern browser is capable of interpreting Javascript without the user needing to do anything special.
Similarly, you can slow down spammers by creating a computation burden by requiring the far end to generate hash collisions, something that can't be done without a powerful computer at the other end working away for some time. In fact, it's probably easier to create questions that machines can answer, but people can't.
What's needed is something that exercises a uniquely human skill that only involves understanding. Perhaps story understanding? Or reasoning about hidden emotions or mental states, both things that people have evolved to do very well? (Note that many real people with autistic spectrum disorder won't be able to answer these questions, though).
-- Neil
On 3/20/06, Neil Harris neil@tonal.clara.co.uk wrote:
I tried to think of a good text-only captcha scheme some time ago, but came up short.
The ideal text captcha is:
- endlessly variable (there must be at least millions of potential
challenges, to defend against replay attacks)
- easy for people to answer without any specialist knowledge
- easy to answer for people without advanced skills in the target language
- not generated by a simple algorithm which can be reverse-engineered
(as with the above)
- not Googlable
- easy to assess the answer using a computer program (which typically
means it's a simple word or phrase)
Even this isn't good enough. Spammers have already figured out out how to defeat every visual captcha out there, and there's no reason to believe it won't also apply to audio and text captchas. It's very simple:
1) Advertise free porn on USENET and other locations. 2) In order to get the pictures, users need to answer a captcha. Serve up the captcha for a site you're trying to register for. 3) User answers the captcha. User gets the porn, spammer gets the new account, everyone's happy.
-- Mark [[User:Carnildo]]
...continuing masturbatory tech talk.... What kind of audio captchas have people tried? I would think that asking people to identify a piece of music would be a pretty good captcha, although could be difficult to not make it Eurocentric. For instance, if you had a bunch of songs as universally recognisable as, say happy birthday, and then had a bunch of different recordings of those, it would be extraordinarily difficult for a computer to recognise the piece, since it could be in any key, in any tuning, with slight rhythmic diffences, but would still be recognisable to anyone who knew the piece. Yes, you would have to figure out what pieces to use and get the recordings, but it would still be cool. Of course, Carnildo/Mark's workaround would still be there. Makemi
On 3/20/06, Mark Wagner carnildo@gmail.com wrote:
On 3/20/06, Neil Harris neil@tonal.clara.co.uk wrote:
I tried to think of a good text-only captcha scheme some time ago, but came up short.
The ideal text captcha is:
- endlessly variable (there must be at least millions of potential
challenges, to defend against replay attacks)
- easy for people to answer without any specialist knowledge
- easy to answer for people without advanced skills in the target
language
- not generated by a simple algorithm which can be reverse-engineered
(as with the above)
- not Googlable
- easy to assess the answer using a computer program (which typically
means it's a simple word or phrase)
Even this isn't good enough. Spammers have already figured out out how to defeat every visual captcha out there, and there's no reason to believe it won't also apply to audio and text captchas. It's very simple:
- Advertise free porn on USENET and other locations.
- In order to get the pictures, users need to answer a captcha.
Serve up the captcha for a site you're trying to register for. 3) User answers the captcha. User gets the porn, spammer gets the new account, everyone's happy.
-- Mark [[User:Carnildo]] _______________________________________________ WikiEN-l mailing list WikiEN-l@Wikipedia.org To unsubscribe from this mailing list, visit: http://mail.wikipedia.org/mailman/listinfo/wikien-l
On 3/21/06, Mak makwik@gmail.com wrote:
...continuing masturbatory tech talk.... What kind of audio captchas have people tried? I would think that asking people to identify a piece of music would be a pretty good captcha, although could be difficult to not make it Eurocentric. For instance, if you had a bunch of songs as universally recognisable as, say happy birthday, and then had a bunch of different recordings of those, it would be extraordinarily difficult for a computer to recognise the piece, since it could be in any key, in any tuning, with slight rhythmic diffences, but would still be recognisable to anyone who knew the piece. Yes, you would have to figure out what pieces to use and get the recordings, but it would still be cool. Of course, Carnildo/Mark's workaround would still be there. Makemi
Forget eurocentric I suspect you would hit anglocentric. The number of tunes known to everyone is so small the computer could just enter the same answer every time and and get an account soon enough. Initialy you could just use plain speach and rely on the issue that no one else uses them so no one is going to bother makeing a bot. More advanced aproaches could involve spoting male and female voices against background noise.
-- geni
On 3/21/06, geni geniice@gmail.com wrote:
Forget eurocentric I suspect you would hit anglocentric. The number of tunes known to everyone is so small the computer could just enter the same answer every time and and get an account soon enough. Initialy you could just use plain speach and rely on the issue that no one else uses them so no one is going to bother makeing a bot. More advanced aproaches could involve spoting male and female voices against background noise.
Beyond "happy birthday" I'm at a loss to think of *any* tune known to everyone. Maybe the Ode to Joy. Maybe.
Steve
Beginning of Beethoven's third? They could at least say who that was by. Makemi
On 3/21/06, Steve Bennett stevage@gmail.com wrote:
On 3/21/06, geni geniice@gmail.com wrote:
Forget eurocentric I suspect you would hit anglocentric. The number of tunes known to everyone is so small the computer could just enter the same answer every time and and get an account soon enough. Initialy you could just use plain speach and rely on the issue that no one else uses them so no one is going to bother makeing a bot. More advanced aproaches could involve spoting male and female voices against background noise.
Beyond "happy birthday" I'm at a loss to think of *any* tune known to everyone. Maybe the Ode to Joy. Maybe.
Steve _______________________________________________ WikiEN-l mailing list WikiEN-l@Wikipedia.org To unsubscribe from this mailing list, visit: http://mail.wikipedia.org/mailman/listinfo/wikien-l
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Steve Bennett wrote:
On 3/21/06, geni geniice@gmail.com wrote:
Forget eurocentric I suspect you would hit anglocentric. The number of tunes known to everyone is so small the computer could just enter the same answer every time and and get an account soon enough.
Beyond "happy birthday" I'm at a loss to think of *any* tune known to everyone. Maybe the Ode to Joy. Maybe.
Possibly, but selecting the European (Pan-)National Anthem as an example is a bit of a sop to the "eurocentric" charge. :-)
Initialy you could just use plain speach and rely on the issue that no one else uses them so no one is going to bother makeing a bot. More advanced aproaches could involve spoting male and female voices against background noise.
That could work, though automated (unattended) text-to-speech synthesis has not significantly improved in my lifetime. (Read: it's terrible. Yes, even IBM's whizzo-prang stuff. Horrendously disjointed and considerably more difficult to use as a "test" of something's humanity than would be desired, I feel.)
Yours sincerely, - -- James D. Forrester Wikimedia : [[W:en:User:Jdforrester|James F.]] E-Mail : james@jdforrester.org IM (MSN) : jamesdforrester@hotmail.com
On 3/21/06, James D. Forrester james@jdforrester.org wrote:
Possibly, but selecting the European (Pan-)National Anthem as an example is a bit of a sop to the "eurocentric" charge. :-)
Not such a crime when you're talking about a language specific wiki project, perhaps. It's also very popular in Japan!
Anyway we have established a nice algorithm for the bot to try: 1. Download audio captcha 2. Guess "Happy Birthday" 3. Guess "Ode to Joy" 4. Guess "Beethoven" 5. Give up.
Steve
James D. Forrester wrote:
Steve Bennett wrote:
On 3/21/06, geni geniice@gmail.com wrote:
Initialy you could just use plain speach and rely on the issue that no one else uses them so no one is going to bother makeing a bot. More advanced aproaches could involve spoting male and female voices against background noise.
That could work, though automated (unattended) text-to-speech synthesis has not significantly improved in my lifetime. (Read: it's terrible. Yes, even IBM's whizzo-prang stuff. Horrendously disjointed and considerably more difficult to use as a "test" of something's humanity than would be desired, I feel.)
Yes, but the nice thing is that we don't really need speech synthesis for this, since recordings of spoken words are plentiful and easily available. In fact, we already have a perfect source: the Spoken Wikipedia project.
Cutting the recordings up to build a decent library of a few thousand words (with a dozen or more recordings of each) does take some work, but could be semiautomated (given that we have access to the written version) and easily done by volunteer contributors.
Background babble can be generated automatically by mixing random segments from the same recordings. Other noises can also be included: anyone with a laptop and a microphone can easily contribute practically limitless amounts of digitally recorded noise.
Ilmari Karonen wrote:
James D. Forrester wrote:
Steve Bennett wrote:
On 3/21/06, geni geniice@gmail.com wrote:
Initialy you could just use plain speach and rely on the issue that no one else uses them so no one is going to bother makeing a bot. More advanced aproaches could involve spoting male and female voices against background noise.
That could work, though automated (unattended) text-to-speech synthesis has not significantly improved in my lifetime. (Read: it's terrible. Yes, even IBM's whizzo-prang stuff. Horrendously disjointed and considerably more difficult to use as a "test" of something's humanity than would be desired, I feel.)
Yes, but the nice thing is that we don't really need speech synthesis for this, since recordings of spoken words are plentiful and easily available. In fact, we already have a perfect source: the Spoken Wikipedia project.
Cutting the recordings up to build a decent library of a few thousand words (with a dozen or more recordings of each) does take some work, but could be semiautomated (given that we have access to the written version) and easily done by volunteer contributors.
Background babble can be generated automatically by mixing random segments from the same recordings. Other noises can also be included: anyone with a laptop and a microphone can easily contribute practically limitless amounts of digitally recorded noise.
Aided by the fact that most modern laptops have microphones built in.
Mak wrote:
...continuing masturbatory tech talk.... What kind of audio captchas have people tried? I would think that asking people to identify a piece of music would be a pretty good captcha, although could be difficult to not make it Eurocentric. For instance, if you had a bunch of songs as universally recognisable as, say happy birthday, and then had a bunch of different recordings of those, it would be extraordinarily difficult for a computer to recognise the piece, since it could be in any key, in any tuning, with slight rhythmic diffences, but would still be recognisable to anyone who knew the piece. Yes, you would have to figure out what pieces to use and get the recordings, but it would still be cool. Of course, Carnildo/Mark's workaround would still be there. Makemi
"Happy Birthday to You" wouldn't be an option: it's still in copyright. See http://www.snopes.com/music/songs/birthday.asp
Also, tune recognition is pretty much a done deal, thanks to the [[Parsons code]] and similar techniques. There are a number of commercial music recognition products available.
-- Neil
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Theresa Knott stated for the record:
Do we have to have catchpa's at all?
No one seems to be addressing the original poster's comment. Why not have a simple IQ test type question? Something that even the most idiotic human could answer but a machine could not?
It's always bothered me[0] that if I were ever in a WWII movie and got separated from my squad, I would have a great deal of proving that I was an American -- I have no idea who won the World Series.
[0] Well, okay, not really.
- -- Sean Barrett | Modern art is what happens when painters stop sean@epoptic.org | looking at girls and persuade themselves | that they have a better idea. --John Ciardi