The Wavelet Digest Homepage
Return to the homepage
Search the complete Wavelet Digest database
Help about the Wavelet Digest mailing list
About the Wavelet Digest
The Digest The Community
 Latest Issue  Back Issues  Events  Gallery
The Wavelet Digest
   -> Volume 4, Issue 8


Preprint: Wavelet technology in a text-to-speech system
 
images/spacer.gifimages/spacer.gif Reply into Digest
Previous :: Next  
Author Message
mei@trlvm.vnet.ibm.com
Guest





PostPosted: Tue Dec 03, 2002 1:30 pm    
Subject: Preprint: Wavelet technology in a text-to-speech system
Reply with quote

Preprint: Wavelet technology in a text-to-speech system

Paper on wavelet technology used in a text-to-speech (TTS) system
and description of and purchasing info about the TTS system.

IBM Tokyo Research Laboratory, Research Report RT0110, June 1995

ProTALKER: a Japanese text-to-speech system for personal computers
(Report is in English.)

Takashi SAITO, Masaharu SAKAMOTO, Yasuhide HASHIMOTO,
Mei KOBAYASHI, Masafumi NISHIMURA, Kazuhiro SUZUKI

IBM Japan, Ltd., Tokyo Research Laboratory,
1623-14, Shimotsuruma, Yamato-shi, Kanagawa-ken 242 Japan.

Abstract:
This paper describes the development of ProTALKER, a Japanese
text-to-speech (TTS) system for personal computers. The quality of
synthesized speech is one of the most important features of any TTS
system. Synthesis methods which are based on manipulation of the speech
signal spectrum (e.g. linear predictive coding synthesis and formant
synthesis) produce comprehensible but unnatural sounding output. The
lack of naturalness commonly associated with these methods results from the
use of oversimplified speech models, small synthesis unit inventories,
and poor handling of text parsing for prosody control. We developed four
new technologies to overcome these difficulties and improve the quality
of output from TTS systems: accurate pitch mark determination by wavelet
analysis, speech waveform generation using a modified time domain pitch
synchronous overlap-add method, speech synthesis unit selection using
a context dependent clustering method, and efficient prosody control
using a 3-phrase parser.

For further info about the report, pls. contact:

Mei Kobayashi (LAB-s73)
IBM Tokyo Research Laboratory
1623-14 Shimotsuruma, Yamato-shi, Kanagawa-ken 242 Japan
tel: 81+462-73-4934, FAX: 81+462-73-7428
e-mail: MEI@TRLVM.VNET.IBM.COM
(during Aug 6-24 only, pls. e-mail: kobayash@math.berkeley.edu)


ProTALKER, version 1.0, a product of IBM Japan, Ltd., reads aloud
Japanese kana-kanji text. It can be installed on any PC with audio
capability and Japanese MS-Windows 3.1, such as the IBM ThinkPad. Users
may choose between 4 levels of speech quality; 2.6MB to 18MB are needed
for dictionary storage, with higher quality speech output requiring
more disk space for a better TTS dictionary. Options include: selection
of gender and facial image of speaker,and volume/loudness, speaking
rate, and intonation control. A cartoon face appears in a smaller
window with two columns of buttons, which look and function like those
of a tape recorder, i.e. start, stop, rewind, and fast forward, plus a
help button, which labelled by a question mark symbol. Any length text
file can be read by ProTALKER, however, only 32MB at a time can be
placed in the application buffer. Special options of ProTALKER include:
converting the text file to a separate waveform file after reading,
appending specialized or technical words to the TTS dictionary, and
invoking a forum utility feature to read aloud contents of user
specified forum file, such as e-mail or net news.

list price: 9,800 yen
for purchasing info., pls. contact:
Nakamura (0462)-73-5932 or Takao (0462)-73-2512
IBM Japan, Ltd., Yamato Development Laboratory,
1623-14, Shimotsuruma, Yamato-shi, Kanagawa-ken 242 Japan.
All times are GMT + 1 Hour
Page 1 of 1

 
Jump to: 
 


disclaimer - webmaster@wavelet.org
Powered by phpBB

This page was created in 0.026658 seconds : 18 queries executed : GZIP compression disabled