On Mon, Aug 16, 2004 at 05:59:23PM +0000, Peter Shaw wrote:
The reason why it's so good, is that they
don't generate the speech from
scratch but instead concatenate recorded speech pieces. This also means that
the program is rather huge, and that listening to text at a speed different
from recording will not sound very good. See "Concatenative synthesis" on
for more details.
Interesting, I did not realize they used such a technique and was
wondering why the pronunciations sounded more natural than eg.
the Festival engine.
What I would like to see is some sort of streaming text-to-speech
server system which can use the Speex codec (http://www.speex.org
as output. Shifting through the text could be done based on approx.
calculations with the length of the text and the configured voice
speed. Client side interaction (voice configuration, streaming
quality, text selections, pointer indication feedback, bookmarks,
history, etc.) would ideally be taken care via some sort of unified
API, eg. using Speech Dispatcher (http://www.freebsoft.org/speechd
More speech synthesis links: http://debianlinux.net/multimedia.html#speech