Re: [Wikipedia-l] Text-to-speech

16 Aug 2004


      On Mon, Aug 16, 2004 at 05:59:23PM +0000, Peter Shaw wrote:
...
The reason why it's so good, is that they don't generate the speech from 
scratch but instead concatenate recorded speech pieces. This also means that 
the program is rather huge, and that listening to text at a speed different 
from recording will not sound very good. See "Concatenative synthesis" on 
http://en.wikipedia.org/wiki/Speech_synthesis for more details.
Interesting, I did not realize they used such a technique and was
wondering why the pronunciations sounded more natural than eg.
the Festival engine.
What I would like to see is some sort of streaming text-to-speech
server system which can use the Speex codec (http://www.speex.org)
as output. Shifting through the text could be done based on approx.
calculations with the length of the text and the configured voice
speed. Client side interaction (voice configuration, streaming
quality, text selections, pointer indication feedback, bookmarks,
history, etc.) would ideally be taken care via some sort of unified
API, eg. using Speech Dispatcher (http://www.freebsoft.org/speechd).
More speech synthesis links: http://debianlinux.net/multimedia.html#speech
Jama Poulsen

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [Wikipedia-l] Text-to-speech