Microsoft has announced the introduction of four new AI driven neural voices for text-to-speech (TTS) applications, which can be used in Azure OpenAI GPT starting from today to help create speech-based chatbots, voice assistants or conversational agents.
The four voices, named en-US-AndrewNeural, en-US-BrianNeural, en-US-EmmaNeural (all in US English), and zh-CH-YunjieNeural (Chinese), are “optimised for conversational scenarios” and are now available for public preview in three regions, East US, South East Asia and West Europe.
Microsoft has showcased some samples of the new voices in comparison to other Neural voices that are currently available to showcase the improvements that have been made with making the speech more natural and fluid.
The voices themselves can be integrated into existing applications making use of Azure OpenAI, by using the Azure Speech SDK or REST API, as well as employing the Azure Bot Framework to develop intelligent bots capable of using the new neural TTS voices.
Microsoft goes on to say the following:
“We began by crafting the persona of each voice as if it were a real person who is , friendly, and optimistic about life, always eager to assist others and share intriguing or practical knowledge. The speaking style of the voice resembles a conversation with an acquaintance over a cup of tea, maintaining a natural and unexaggerated tone.”
Furthermore, we continuously enhance our Text-to-Speech (TTS) modeling techniques to improve the quality of our AI voices. Our most recent projects, such as DelightfulTTS 2, and MuLanTTS, have significantly narrowed the quality gap between AI voices and professional human recordings, producing more natural and realistic voices than ever before. These technological advancements serve as the foundation upon which these new AI voices are built.
The four new voices will sit alongside the existing offering of over 400 neural voices, which cover more than 140 languages and locales.