Crowd-sourcing platform for large-scale speech data collection

João Freitas, António Calado, Daniela Braga, Pedro Silva, Miguel Sales Dias

Abstract: This paper presents an online platform based on crowd sourcing for speech data collection, named YourSpeech. This platform aims at collecting desktop speech data at negligible costs for any language, in order to provide larger training data for Automatic Speech Recognition (ASR) systems. YourSpeech provides means for users to donate their speech through a quiz game and a through a platform that allows the deployment of a personalized TTS (Text-to-Speech) system. We have already collected more than 25 hours of pure speech for European Portuguese (EP) and achieved a Word Error Rate (WER) of 1% over 10% of the collected corpus.

Index Terms: Speech data, crowd sourcing, speech donation, Text-to-Speech, Automatic Speech Recognition.

Full Paper