Abstract: In the context of text processing for Text-to-Speech (TTS) synthesis, we aim to automatically direct the expressiveness in speech through tagging the input text appropriately. Since the nature of text presents different characteristics according to whether it is domain-dependent (expressiveness related to its topics) or sentiment-dependent (expressiveness related to its sentiment), we study how these traits influence the identification of expressiveness in text, and develop a successful classification strategy. To this end, we consider two principal Text Classification (TC) methods, the Reduced Associative Relational Network and the Maximum Entropy classifier, and evaluate their performed effectiveness in a domain/sentiment dependent environments. Additionally, we also evaluate how sensitive the classifiers are to the size of training data. The overall conclusions indicate that moving from a domain-dependent environment to a more general sentiment-dependent environment strictly results in poorer effectiveness rates despite the sensible generalisation advantage that sentiment provides for dealing with expressiveness, and there is little influence by the size of the training data.
Index Terms: domain classification, sentiment classification, expressive Text-to-Speech synthesis.