Hi,
I just found an AT&T demo site for text-to-speech synthesis:
http://www.research.att.com/projects/tts/demo.html
which appears to generate *way* better speech than I got from other (local installation) demos.
They want to sell it, of course, but I thought we should ask them for a cooperation. This would be the ultimate demonstration for their software (better than a million people typing "this is a test"), and it could enable us to provide access to the "visually impaired" without the need for a local text-to-speech browser, and with better speech quality. Or it could be a "convenience link", like "read this article to me, I'm too lazy to move my eyes" (or: wikipedia for mp3 players? ;-)
As Wikipedia has a good and innovative image, and since we wouldn't buy their product anyway, I guess AT&T would be interested in such a thing. Question is, would we?
Magnus
Magnus Manske schrieb:
As Wikipedia has a good and innovative image, and since we wouldn't buy their product anyway, I guess AT&T would be interested in such a thing. Question is, would we?
Sure! I would love to see that.
The quality even of the German part is impressive, with much space to improve :)
Mathias
Wow, the tests are IMPRESSIVE. I think they need not have any problem with our linking to them.
Thanks for the link, anyway.
Pedro.
Text-to-speech
Magnus Manske schrieb:
As Wikipedia has a good and innovative image, and since we wouldn't buy their product anyway, I guess AT&T would be interested in such a thing. Question is, would we?
Sure! I would love to see that.
The quality even of the German part is impressive, with much space to improve :)
Mathias
_______________________________________________ Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
--- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004
--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004
Magnus Manske wrote:
I just found an AT&T demo site for text-to-speech synthesis:
http://www.research.att.com/projects/tts/demo.html
which appears to generate *way* better speech than I got from other (local installation) demos.
Well, personally, I find it rather disappointing, and I have heard better ones even several years ago. However, it would certainly be pretty cool to have Wikipedia articles as sound files.
This might even combine well with the validation feature. We could ask them to synthesise an article whenever it reaches a certain level of validation. That way, even though Wikipedia can be edited by anyone, people would not be able to abuse the system to use the software for their own purposes.
Timwi
Peter Shaw wrote:
On Saturday 14 August 2004 00:19, Timwi wrote:
Well, personally, I find it rather disappointing, and I have heard better ones even several years ago. However, it would certainly be pretty cool to have Wikipedia articles as sound files.
Where?
The "better one" I am talking about was an old DOS application called "SBTalker", and it came with the Sound Blaster Pro 4. Needless to say, it no longer works under Windows.
On Friday 13 August 2004 08:24, Magnus Manske wrote:
I just found an AT&T demo site for text-to-speech synthesis:
http://www.research.att.com/projects/tts/demo.html
which appears to generate *way* better speech than I got from other (local installation) demos.
The reason why it's so good, is that they don't generate the speech from scratch but instead concatenate recorded speech pieces. This also means that the program is rather huge, and that listening to text at a speed different from recording will not sound very good. See "Concatenative synthesis" on http://en.wikipedia.org/wiki/Speech_synthesis for more details.
Peter Shaw
On Mon, Aug 16, 2004 at 05:59:23PM +0000, Peter Shaw wrote:
The reason why it's so good, is that they don't generate the speech from scratch but instead concatenate recorded speech pieces. This also means that the program is rather huge, and that listening to text at a speed different from recording will not sound very good. See "Concatenative synthesis" on http://en.wikipedia.org/wiki/Speech_synthesis for more details.
Interesting, I did not realize they used such a technique and was wondering why the pronunciations sounded more natural than eg. the Festival engine.
What I would like to see is some sort of streaming text-to-speech server system which can use the Speex codec (http://www.speex.org) as output. Shifting through the text could be done based on approx. calculations with the length of the text and the configured voice speed. Client side interaction (voice configuration, streaming quality, text selections, pointer indication feedback, bookmarks, history, etc.) would ideally be taken care via some sort of unified API, eg. using Speech Dispatcher (http://www.freebsoft.org/speechd).
More speech synthesis links: http://debianlinux.net/multimedia.html#speech
Jama Poulsen
wikipedia-l@lists.wikimedia.org