Pushing the frontier of neural text to speech

Channel:

Microsoft Research

Subscribers:

351,000

Published on May 27, 2021 8:38:51 PM ● Video Link: https://www.youtube.com/watch?v=MA8PCvmr8B0

Duration: 1:15:36

8,590 views

147

In the popular field of text to speech, the goal is to transform the written or printed word into speech that is natural and intelligible. Today, the technology is being used in products and services to help people who are blind or have low vision consume digital content, power personal digital assistants that sound more realistic, and make it easier to do two things at once, such as listening to an article online while washing dishes, among other applications. Although the quality of synthesized speech has gotten better thanks to neural network-based end-to-end TTS, advancing neural TTS and allowing it to be more easily integrated into product development and deployment requires overcoming a variety of remaining challenges.

In this webinar, Senior Researcher Xu Tan will talk about these challenges, specifically the high computational cost and slow inference speed in online serving; word skipping and repeating issues, poor voice quality, and lack of voice controllability; the large amounts of training data needed for improved voice synthesis; and the practical challenges in TTS voice adaptation. He’ll introduce his team’s work addressing these challenges—including fast TTS, end-to-end TTS, low-resource TTS, and adaptive TTS—as well as discuss other critical questions and opportunities to pursue in the space.

Together, you'll explore:

■ An overview of text to speech, including its evolution
■ The important challenges in neural text to speech and how to address them with dedicated research
■ How to factor product development into your research

𝗥𝗲𝘀𝗼𝘂𝗿𝗰𝗲 𝗹𝗶𝘀𝘁:

■ Text to Speech (project page): https://www.microsoft.com/en-us/research/project/text-to-speech/
■ Xu Tan (publications page): https://www.microsoft.com/en-us/research/people/xuta/publications/
■ Speech Research Repository Master List (GitHub): https://speechresearch.github.io/
■ FastSpeech: Fast, Robust and Controllable Text to Speech (GitHub): https://speechresearch.github.io/fastspeech/
■ FastSpeech 2: Fast and High-Quality End-to-End Text to Speech (GitHub): https://speechresearch.github.io/fastspeech2/
■ AdaSpeech: Adaptive Text to Speech for Custom Voice (GitHub): https://speechresearch.github.io/adaspeech/
■ AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data (GitHub): https://speechresearch.github.io/adaspeech2/
■ LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search (GitHub): https://speechresearch.github.io/lightspeech/
■ LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition (Github): https://speechresearch.github.io/lrspeech/
■ Neural Text-to-Speech previews five new languages with innovative models in the low-resource setting (blog): https://techcommunity.microsoft.com/t5/azure-ai/neural-text-to-speech-previews-five-new-languages-with/ba-p/1907604
■ Microsoft Azure Text to Speech: https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech
■ Microsoft Azure Custom Voice: https://speech.microsoft.com/customvoice
■ Xu Tan (Researcher profile): https://www.microsoft.com/en-us/research/people/xuta/

*This on-demand webinar features a previously recorded Q&A session and open captioning.

This webinar originally aired on May 20, 2021

Explore more Microsoft Research webinars: https://aka.ms/msrwebinars

Other Videos By Microsoft Research

2021-06-09	Volumetric Mapping for Long-term Robot Interaction \| JRC Workshop 2021
2021-06-09	Controllable Human Motion Generation from Trajectories \| JRC Workshop 2021
2021-06-09	Towards Markerless Surgical Tool and Hand Pose Estimation \| JRC Workshop 2021
2021-06-09	Project Altair: Infrared Vision and AI-Decision Making for Longer Drone Flights
2021-06-09	Digital Characters in Virtual Experiences \| JRC Workshop 2021
2021-06-09	Reconstructing 3D Human with Learning-based Method \| JRC Workshop 2021
2021-06-09	Freetures: Localization in Signed Distance Function Maps \| JRC Workshop 2021
2021-06-03	Racist Tropes & Labor Discipline: How Tech Inherits & Reproduces Global Imaginaries of Race and Work
2021-06-02	Directions in ML: Latent Stochastic Differential Equations: An Unexplored Model Class
2021-05-27	Fuzzing to improve the security and reliability of cloud services with RESTler
2021-05-27	Pushing the frontier of neural text to speech
2021-05-27	Foundations of Real-World Reinforcement Learning
2021-05-27	Homomorphic Encryption with Microsoft SEAL
2021-05-27	Data Visualization: Bridging the Gap Between Users and Information
2021-05-26	Exploring Reinforcement Learning Methods from Algorithm to Application
2021-05-26	Microsoft Rocket: Hybrid Edge + Cloud Video Analytics Platform
2021-05-26	Harnessing high-fidelity simulation for autonomous systems through AirSim
2021-05-26	Microsoft ElectionGuard—enabling voters to verify that their votes are correctly counted
2021-05-26	Designing Computer Vision Algorithms to Describe the Visual World to People Who Are Blind/Low Vision
2021-05-26	The next generation of developer tools for data programming
2021-05-26	Expanding the possibilities of programming languages with Bosque

Tags:

neural text to speech

text to speech

TTS

Xu Tan

Text-to-Speech

Microsoft Research

Channel	Latest
Rukesan	6 hours ago
Villain Ki Haveli	6 hours ago
Cyber Crumbs	6 hours ago
Power_diplomacy	6 hours ago
Ranaji Gaming	6 hours ago
FBN BOOM	6 hours ago
John Christian Mateo	6 hours ago
Immortal Suyou	6 hours ago
Hazefest	6 hours ago
ArguzZetsu	6 hours ago
monyson khulpuwa	6 hours ago
Shobhit Gamer	6 hours ago
Gipo	6 hours ago
Anomax Tv	6 hours ago
HOPELESS FF	6 hours ago
Mini Otaku	6 hours ago
Board Game Museum	6 hours ago
Selp	6 hours ago
Challenger Replays	6 hours ago
CwapPlatinum	6 hours ago
BANT	6 hours ago
Z GamePlay 11	6 hours ago
FARHANHAN CHANNEL	6 hours ago
Early Bird Gaming	6 hours ago
Belion gamer	6 hours ago