Tokenize any input, even continuous vectors! - Residual Vector Quantization - VALL-E (Part 2)
VALL-E can generate audio of any text from just 3 seconds of audio sample. We will dissect the technology behind it, how it works, and also delve in a bit more into this cool quantization technique called Residual Vector Quantization, which allows quantizing of a continuous vector input space.
Part 1 here (Watch mainly for the explanation of the Mel Spectrogram): https://www.youtube.com/watch?v=G9k-2mYl6Vo
Slides and Jupyter Notebook can be found here: https://github.com/tanchongmin/TensorFlow-Implementations/tree/main/Paper_Reviews/Encodec
Related papers:
Soundstream (First paper which introduced Residual Vector Quantization in modern times): https://arxiv.org/abs/2107.03312
Encodec (high fidelity audio compression which generates quantized codes): https://arxiv.org/pdf/2210.13438.pdf
VALL-E (Paper we are discussing): https://valle-demo.github.io/
VALL-E X (Cross-lingual VALL-E): https://arxiv.org/pdf/2303.03926.pdf
Universal Speech Model (Automatic Speech Recognition with 12 million hours pre-training data - showing the scalability of pre-training data): https://sites.research.google/usm/
Tacotron 2 (Generating text to speech via Mel Spectrogram): https://pytorch.org/hub/nvidia_deeplearningexamples_tacotron2/
~~~~~
0:00 Introduction
4:14 Time and Frequency Domain representations
10:28 Recap on Part 1
12:30 Encodec (Corrected Model Explanation)
31:04 Coding session with Encodec!
53:52 VALL-E
55:40 Residual Vector Quantization and Hierarchical Representation
1:21:53 VALL-E Token Generation
1:33:55 Results
1:38:19 Limitations
1:41:14 How to perform hierarchical prediction?
1:45:04 Discussion
~~~~~
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.
Discord: https://discord.gg/fXCZCPYs
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/.
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin
Other Videos By John Tan Chong Min
Other Statistics
Residual Statistics For John Tan Chong Min
There are 888 views in 1 video for Residual. About 2 hours worth of Residual videos were uploaded to his channel, less than 0.69% of the total video content that John Tan Chong Min has uploaded to YouTube.