On Mon, Aug 16, 2004 at 05:59:23PM +0000, Peter Shaw wrote:
The reason why it's so good, is that they don't generate the speech from scratch but instead concatenate recorded speech pieces. This also means that the program is rather huge, and that listening to text at a speed different from recording will not sound very good. See "Concatenative synthesis" on http://en.wikipedia.org/wiki/Speech_synthesis for more details.
Interesting, I did not realize they used such a technique and was wondering why the pronunciations sounded more natural than eg. the Festival engine.
What I would like to see is some sort of streaming text-to-speech server system which can use the Speex codec (http://www.speex.org) as output. Shifting through the text could be done based on approx. calculations with the length of the text and the configured voice speed. Client side interaction (voice configuration, streaming quality, text selections, pointer indication feedback, bookmarks, history, etc.) would ideally be taken care via some sort of unified API, eg. using Speech Dispatcher (http://www.freebsoft.org/speechd).
More speech synthesis links: http://debianlinux.net/multimedia.html#speech
Jama Poulsen