Speech Technology Group: History


You can speed up your browing using these mini-links:

  1. Brief research interests
  2. Activities in text-to-speech
  3. Activities in speech recognition
  4. Activities in hardware design
  5. Activities in aids for the disabled
  6. Thesis in development and date of expected presentation
  7. References


The Speech Technology Group (Grupo de Tecnología del Habla, GTH) is part of the Department of Electronic Engineering (Departamento de Ingeniería Electrónica, DIE), which belongs to the Technical University of Madrid (Universidad Politécnica de Madrid, UPM), at the High Technical School of Telecommunication Engineering (Escuela Técnica Superior de Ingenieros de Telecomunicación, ETSIT).

The group is devoted to research and development in various areas of speech science and technology, specially speech synthesis and recognition, and technology applications in the office environment, telephone environment and aids for the handicapped.



Brief history :

The activities in text to speech started in 1980 led by Prof. E. Muñoz. Previously, in 1978 a prototype for a time domain synthesis with limited vocabulary was applied to an Spanish talking calculator. An article was published in EUROMICRO 1978 [MAR 78]. In 1981 a first text-to-speech system, non-real time was shown [SAN 81].

A Cooperation with Speech Plus Inc (previously Telesensory Speech Systems) started in 1980 in order to implement the Prose 2000 text-to-speech converter for Spanish. In 1983, a first version of the system was demonstrated with acceptable results in intelligibility tests (comprehension close to natural speech) [OLA 84]. In 1986 a first commercial version was introduced into the market by Speech Plus Inc. The group has also contributed to the development of the first version of Spanish DECtalk (of Digital Equipment Corporation) in 1984.

Since 1983, our objectives have been to improve the prosody of the synthesizer and to improve the tools for research on text-to-speech. We have created specific segment duration rules. We have worked on improving the intonation, mainly on breath group parsing and the automatic generation of natural pitch contours [PAR 87],[MOR 89].

Since 1989 we have collaborated with Telefónica I+D to develop a new Spanish text-to-speech synthesizer based on diphones [SIL 90]. We have contributed also to the architecture implementation of the text-to-speech synthesizer being produced in the Esprit project n. 2104 "POLYGLOT". Since 1991 we participate in the COST 233 "Prosodics of the synthetic speech".

In 1992, we developed our own text-to-speech system based on in-house PC board and software. This board supports also other programs for speech analysis and recognition and is commercially available today. Different improved versions have appeared since then.

Recent Work:
Through the TIDE project VAESS TP 1174 (1994-1995) we have started the generation of new voices to add to the synthesizer. We are also working on improved methods for prosody generation. We are working also in different synthesis methods (formants, vaweform concatenation).



Brief History:

In parallel with the activities in text-to-speech, a speech recognition project started in 1978 with two applications in mind a) an speech training aid for the deaf [PAR 80],[PAR 82],[PAR 83], [AGU 86] and b) an spanish isolated digits, speaker independent recognizer.

In 1983 a first prototype of speaker-independent spanish digits recognizer was shown (Ph D. Thesis of A. Golderos ,and several national and international papers published [GOL 83]).

During 1983-1984 J.M.Pardo spent 13 months at MIT RLE Speech Communication Group working in speech recognition [PAR 86].

In 1985 a new emphasis on speech recognition was made with the starting of a project on isolated-word , 1000 word-vocabulary speaker independent recognition. In 1985-1988 we cooperated with SRI International in this project. We developed a system that recognizes 1000 isolated words in Spanish independent of the speaker [PAR 89, PAR 91].

In 1986, with Spain joining the EC, we joined the Esprit project 291-860 "Linguistic Analysis of the European Languages"(1985-1989) where we worked on a lexical access system from isolated phoneme strings for different languages including spanish using statistical and heuristic language knowledge [BOV 87].

We have also developed new systems for isolated-digits recognition speaker independent with more than 99% accuracy [FER 91].

In 1989-92 we worked in the Esprit project n. 2104 POLYGLOT, where we developed a large vocabulary isolated words recognizer and a continuous speech recognizer.

Recent Work:
We have developed a prototype of a large vocabulary isolated word speech recognizer in Spanish (8.000 words). We are improving it both on the acoustic side and on the linguistic side. This system works in real time and uses a board developed in our department by S. Aguilera.
We have worked also in telephone services using speech recognition (speaker independent) and synthesis. An speaker independent digits/commands word recognition system on the telephone is available.
We are now working in continuous speech recognition and language models applied to speech understanding.



In 1985 we designed a prototype text-to-speech hardware [SAN 85]. In 1986 we worked on hardware implementations of speech recognition systems (together with SRI International and the LSI group of the EE Department-UPM [SAN 89]). Inside our group, we have developed a DSP board which hosts a series of programs, PCVOX : a tool to acquire, analyze and visualize speech, ISOTON: a tool to use speech analysis and display in the training of deaf people and TEL-ECO a text-to-speech converter.



Since 1976, led by Prof. E. Muñoz we have worked on applications of speech technology on aids for the disabled : Visual and Hearing Impaired.

Our first activity was the development of an spanish talking calculator for the Blind (1978). In 1979 we started work for the rehabilitation of the Deaf speech based on analog speech processing.

Today, we have developed an integrated system (VISHA) that covers several aspects of speech training [BOR 85, AGU 86, AGU 86 b, MAT 90, BER 92]:





[AGU 86] S. AGUILERA, A. BORRAJO, J.M. PARDO, E. MUÑOZ "Speech Analysis Based Devices for Diagnosis and Education on Speech and Hearing Impaired People" Proc. International Conference on Acoustics, Speech and Signal Processing, ICASSP 86, 641-644 (1986)

[AGU 86 b] S. AGUILERA, J.M. PARDO, A. SANTOS, E.MUÑOZ "Speech Based Aids for the Blind: Madrid Experience" Communication System for the Blind, Rainer F. V. Witte (ed) Verlag der Deutschen Blindenstudienanstalt. V. Marburg /Lahn 1990, pp 140-146

[BOV 87] L. BOVES, M. REFICE "The linguistic processor in a Multilingual Text-to-Speech and Speech-to-text conversion system" Proc. of the European Conference on Speech Technology, pp 385-388, J. Laver and M. Jack (ed) CEP Consultants, Edinburg 1987

[BOR 85] A. BORRAJO, S. AGUILERA, J.M. PARDO, E. MUÑOZ "An efficient pitch extraction method for diagnosis and education" Proc. MELECON-85 , Vol I, Bioengineering, 33-36, Madrid 1985.

[BER 92] M.A. BERROJO, J. CORRALES, J. MACIAS, S. AGUILERA "A PC graphic tool for speech research based on DSP board" Accepted in the International Conference on Spoken Language Processing ICSPL -92 Alberta Canada.

[MAT 90] J.F. MATEOS, A. MACARRON, S. AGUILERA "A PC card for rehabilitation of deficient auditive people" Proc. V European Signal Processing Conference. EUSIPCO -90. 1990.

[FER 91] J. FERREIROS, A. CASTRO, J.M. PARDO "Comparison between two different approaches in speaker-independent isolated digit recognition" Proc. of EUROSPEECH 1991

[GOL 83] A. GOLDEROS "Reconocimiento de palabras aisladas con independencia del locutor, aplicación al reconocimiento de dígitos en español" Tesis doctoral ETSIT-Universidad Politécnica de Madrid, Septiembre 1983

[MAR 78] R.MARTINEZ et al "A spanish talking calculator" Proc. of EUROMICRO 1978

[MOR 89] P.J. MORENO, M. MARTINEZ, J.M. PARDO, J.A VALLEJO "Improving naturalness in a text to speech system with a new fundamental frequency algorithm" Proc. Eurospeech 1989, Ed CEP Consultants Ltd, Vol I, pp 360-363 A

[OLA 84] J.C. OLABE, A. SANTOS, R. MARTINEZ, E. MUÑOZ, M. MARTINEZ, A. QUILIS, J. BERNSTEIN "Real time text-to-speech Conversion System for Spanish" Proc. of IEEE Int. Conf. on Acoustics Speech and Signal Processing, pp 2.10.1- 2.10.3 1984

[PAR 80] J.M. PARDO "On the application of DSP to speech training for the deaf" First European Signal Processing Conference EUSIPCO 80, Lausanne 1980

[PAR 82] J.M. PARDO "Vocal Tract Shape Analysis for Children" Proc. Int. Conf. on Acoustic Speech and Signal Processing. 763-766, IEEE (1982)

[PAR 83] J.M. PARDO, S. AGUILERA, J. OLABE, E. MUÑOZ "Speech Learning Aid for the Deaf: Results and Design Implications" Signal Processing II Theories and Applications. 609-612. H.W. Schussler ed. Elsevier Science. Publishers (1983)

[PAR 86] J.M. PARDO "On the Determination of Speech Boundaries: A Tool for Providing Anchor Time Points in Speech Recognition" Proc. International Conference on Acoustics, Speech and Signal Processing ICASSP 86, 2267-2270 (1986)

[PAR 87] J.M. PARDO, M. MARTINEZ, A. QUILIS, E. MUÑOZ "Improving Text-to Speech Conversion in Spanish: Linguistic Analysis and Prosody" Proc. European Conference on Speech Technology, Vol. 2, CEP Consultants LTD, 173-176, Edinburgh (1987) A

[PAR 89] J.M. PARDO, H. HASAN "Large vocabulary, speaker independent isolated word speech recognition using Hidden Markov Models" Proc. Eurospeech 1989, Ed CEP Consultants Ltd Vol II pp 146-149 A

[PAR 91] J.M.PARDO, H. HASAN, P. COLAS "Speaker independent, 1000 words speech recognition in Spanish" In Speech Recognition and Understanding: Recent Advances, Trends and Applications, P.Laface and R. de Mori eds, Springer Verlag 1992, pp 119-124


[SAN 85] A. SANTOS "Implementation of a text to speech converter for Spanish" Proc MELECON 1985 Vol II, pp 283-286, Elsevier Science Publishers, 1985

[SIL 90] J.A. SILES "A new service over the Spanish Telephone network with speech recognition and synthesis" in Signal Processing V: Theories and Applications, L. Torres, E. Masgrau, M. A Lagunas (eds) Elsevier Science, 1990, pp 85-91.