Workshop: Advances in Speech Technologies
June 23, 2011 | IRCAM
----
Synthetic Speech: Beyond Mere Intelligibility
Simon King
Centre for Speech Technology Research. The University of Edinburgh
Some text-to-speech synthesisers are now as intelligible as human speech. This is a remarkable achievement, but the next big challenge is to approach human-like naturalness, which will be even harder. I will describe several lines of research which are attempting to imbue speech synthesisers with the properties they need to sound more "natural" - whatever that means. The starting point is personalised speech synthesis, which allows the synthesiser to sound like an individual person without requiring substantial amounts of their recorded speech. I will then describe how we can work from imperfect recordings or achieve personalised speech synthesis across languages, with a few diversions to consider what it means to sound like the same person in two different languages and how vocal attractiveness plays a role. Since the voice is not only our preferred means of communication but also a central part of our identity, losing it can be distressing. Current voice-output communication aids offer a very poor selection of voices, but recent research means that soon it will be possible to provide people who are losing the ability to speak, perhaps due to conditions such as Motor Neurone Disease, with personalised communication aids that sound just like they used to, even if we do not have a recording of their original voice. There will be plenty of examples, including synthetic child speech, personalised synthesis across the language barrier, and the reconstruction of voices from recordings of disordered speech. This work was done with Junichi Yamagishi, Sandra Andraszewicz, Oliver Watts, Mirjam Wester and many others.
This workshop was organized by Nicolas Obin - IRCAM.