The Speech Technology Group (Grupo de Tecnología del Habla, GTH) is part of the Department of Electronic Engineering (Departamento de Ingeniería Electrónica, DIE), which belongs to the Technical University of Madrid (Universidad Politécnica de Madrid, UPM), at the High Technical School of Telecommunication Engineering (Escuela Técnica Superior de Ingenieros de Telecomunicación, ETSIT).

The group is devoted to research and development in various areas of speech science and technology, specially speech synthesis and recognition, and technology applications in the office environment, telephone environment and aids for the handicapped.



Brief history :

The activities in text to speech started in 1980 led by Prof. E. Muñoz. Previously, in 1978 a prototype for a time domain synthesis with limited vocabulary was applied to an Spanish talking calculator. An article was published in EUROMICRO 1978 [MAR 78]. In 1981 a first text-to-speech system, non-real time was shown [SAN 81].

A Cooperation with Speech Plus Inc (previously Telesensory Speech Systems) started in 1980 in order to implement the Prose 2000 text-to-speech converter for Spanish. In 1983, a first version of the system was demonstrated with acceptable results in intelligibility tests (comprehension close to natural speech) [OLA 84]. In 1986 a first commercial version was introduced into the market by Speech Plus Inc. The group has also contributed to the development of the first version of Spanish DECtalk (of Digital Equipment Corporation) in 1984.

Since 1983, our objectives have been to improve the prosody of the synthesizer and to improve the tools for research on text-to-speech. We have created specific segment duration rules. We have worked on improving the intonation, mainly on breath group parsing and the automatic generation of natural pitch contours [PAR 87],[MOR 89].

Since 1989 we have collaborated with Telefónica I+D to develop a new Spanish text-to-speech synthesizer based on diphones [SIL 90]. We have contributed also to the architecture implementation of the text-to-speech synthesizer being produced in the Esprit project n. 2104 "POLYGLOT". Since 1991 we participate in the COST 233 "Prosodics of the synthetic speech".

In 1992, we developed our own text-to-speech system based on in-house PC board and software. This board supports also other programs for speech analysis and recognition and is commercially available today. Different improved versions have appeared since then.

Recent Work:
Through the TIDE project VAESS TP 1174 (1994-1995) we have started the generation of new voices to add to the synthesizer. We are also working on improved methods for prosody generation. We are working also in different synthesis methods (formants, vaweform concatenation).



Brief History:

In parallel with the activities in text-to-speech, a speech recognition project started in 1978 with two applications in mind a) an speech training aid for the deaf [PAR 80],[PAR 82],[PAR 83], [AGU 86] and b) an spanish isolated digits, speaker independent recognizer.

In 1983 a first prototype of speaker-independent spanish digits recognizer was shown (Ph D. Thesis of A. Golderos ,and several national and international papers published [GOL 83]).

During 1983-1984 J.M.Pardo spent 13 months at MIT RLE Speech Communication Group working in speech recognition [PAR 86].

In 1985 a new emphasis on speech recognition was made with the starting of a project on isolated-word , 1000 word-vocabulary speaker independent recognition. In 1985-1988 we cooperated with SRI International in this project. We developed a system that recognizes 1000 isolated words in Spanish independent of the speaker [PAR 89, PAR 91].

In 1986, with Spain joining the EC, we joined the Esprit project 291-860 "Linguistic Analysis of the European Languages"(1985-1989) where we worked on a lexical access system from isolated phoneme strings for different languages including spanish using statistical and heuristic language knowledge [BOV 87].

We have also developed new systems for isolated-digits recognition speaker independent with more than 99% accuracy [FER 91].

In 1989-92 we worked in the Esprit project n. 2104 POLYGLOT, where we developed a large vocabulary isolated words recognizer and a continuous speech recognizer.

Recent Work:
We have developed a prototype of a large vocabulary isolated word speech recognizer in Spanish (8.000 words). We are improving it both on the acoustic side and on the linguistic side. This system works in real time and uses a board developed in our department by S. Aguilera.
We have worked also in telephone services using speech recognition (speaker independent) and synthesis. An speaker independent digits/commands word recognition system on the telephone is available.
We are now working in continuous speech recognition and language models applied to speech understanding.



In 1985 we designed a prototype text-to-speech hardware [SAN 85]. In 1986 we worked on hardware implementations of speech recognition systems (together with SRI International and the LSI group of the EE Department-UPM [SAN 89]). Inside our group, we have developed a DSP board which hosts a series of programs, PCVOX : a tool to acquire, analyze and visualize speech, ISOTON: a tool to use speech analysis and display in the training of deaf people and TEL-ECO a text-to-speech converter.



Since 1976, led by Prof. E. Muñoz we have worked on applications of speech technology on aids for the disabled : Visual and Hearing Impaired.

Our first activity was the development of an spanish talking calculator for the Blind (1978). In 1979 we started work for the rehabilitation of the Deaf speech based on analog speech processing.

Today, we have developed an integrated system (VISHA) that covers several aspects of speech training [BOR 85, AGU 86, AGU 86 b, MAT 90, BER 92]:





