Tokenize any input, even continuous vectors! - Residual Vector Quantization - VALL-E (Part 2)

Subscribers:
5,330
Published on ● Video Link: https://www.youtube.com/watch?v=JZvF1UsCWC8



Game:
Residual (2020)
Duration: 2:10:58
888 views
16


VALL-E can generate audio of any text from just 3 seconds of audio sample. We will dissect the technology behind it, how it works, and also delve in a bit more into this cool quantization technique called Residual Vector Quantization, which allows quantizing of a continuous vector input space.

Part 1 here (Watch mainly for the explanation of the Mel Spectrogram): https://www.youtube.com/watch?v=G9k-2mYl6Vo

Slides and Jupyter Notebook can be found here: https://github.com/tanchongmin/TensorFlow-Implementations/tree/main/Paper_Reviews/Encodec

Related papers:
Soundstream (First paper which introduced Residual Vector Quantization in modern times): https://arxiv.org/abs/2107.03312
Encodec (high fidelity audio compression which generates quantized codes): https://arxiv.org/pdf/2210.13438.pdf
VALL-E (Paper we are discussing): https://valle-demo.github.io/
VALL-E X (Cross-lingual VALL-E): https://arxiv.org/pdf/2303.03926.pdf
Universal Speech Model (Automatic Speech Recognition with 12 million hours pre-training data - showing the scalability of pre-training data): https://sites.research.google/usm/
Tacotron 2 (Generating text to speech via Mel Spectrogram): https://pytorch.org/hub/nvidia_deeplearningexamples_tacotron2/

~~~~~
0:00 Introduction
4:14 Time and Frequency Domain representations
10:28 Recap on Part 1
12:30 Encodec (Corrected Model Explanation)
31:04 Coding session with Encodec!
53:52 VALL-E
55:40 Residual Vector Quantization and Hierarchical Representation
1:21:53 VALL-E Token Generation
1:33:55 Results
1:38:19 Limitations
1:41:14 How to perform hierarchical prediction?
1:45:04 Discussion

~~~~~

AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.

Discord: https://discord.gg/fXCZCPYs
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/.
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin




Other Videos By John Tan Chong Min


2023-04-25Creating a ChatGPT Harry Potter Text-based RPG game!
2023-04-25Learn from just Memory Storage and Retrieval: Generative Agents Interacting in Simulation!
2023-04-18The future is neuro-symbolic: Expressiveness of ChatGPT and generalizability of symbols (SymbolicAI)
2023-04-17Can GPT4 solve the Abstraction and Reasoning Corpus (ARC) Challenge Zero-Shot?
2023-04-12GPT4: Zero-shot Classification without any examples + Fine-tune with reflection
2023-04-11OpenAI Vector Embeddings - Talk to any book or document; Retrieval-Augmented Generation!
2023-04-11Tutorial #2: OpenAI Vector Embeddings and Pinecone for Retrieval-Augmented Generation
2023-04-04Creating JARVIS: ChatGPT + APIs - HuggingGPT, Memory-Augmented Context, Meta GPT structures
2023-04-02Is GPT4 capable of self-improving? Are we heading for AGI or AI doom?
2023-03-28How Visual ChatGPT works + Toolformer/Wolfram Alpha. LLMs with Tools/APIs/Plugins is the way ahead!
2023-03-21Tokenize any input, even continuous vectors! - Residual Vector Quantization - VALL-E (Part 2)
2023-03-07Using Transformers to mimic anyone's voice! - VALL-E (Part 1)
2023-02-28Learning Part-Whole Structure by Chunking - More Efficient than Deep Learning!!!
2023-02-21High-level planning with large language models - SayCan
2023-02-13Learning, Fast and Slow: Towards Fast and Adaptable Agents in Changing Environments
2023-02-07Using Logic Gates as Neurons - Deep Differentiable Logic Gate Networks!
2023-01-31Learn from External Memory, not just Weights: Large-Scale Retrieval for Reinforcement Learning
2023-01-17How ChatGPT works - From Transformers to Reinforcement Learning with Human Feedback (RLHF)
2023-01-09HyperTree Proof Search - Automated Theorem Proving with AlphaZero and Transformers!
2022-12-23CodinGame Fall Challenge 2022: A First Look (managed to get to Silver!)
2022-12-21Can ChatGPT solve CodinGame/Google Kickstart problems?



Other Statistics

Residual Statistics For John Tan Chong Min

There are 888 views in 1 video for Residual. About 2 hours worth of Residual videos were uploaded to his channel, less than 0.69% of the total video content that John Tan Chong Min has uploaded to YouTube.