Using Transformers to mimic anyone's voice! - VALL-E (Part 1)

Channel:

John Tan Chong Min

Subscribers:

6,300

Published on March 7, 2023 4:11:56 PM ● Video Link: https://www.youtube.com/watch?v=G9k-2mYl6Vo

Duration: 1:56:31

960 views

Edit: I realize I made some mistakes in the Encodec structure (the Quantization is actually part of the Encoder, hence VALL-E doesn't need to learn the Quantizer and the codebooks). The corrected explanation, as well as the rest of the presentation, can be found in Part 2 here: https://www.youtube.com/watch?v=JZvF1UsCWC8

VALL-E can generate audio of any (English) text from just 3 seconds of audio sample. We will dissect the technology behind it, how it works, and also discuss whether the Transformer architecture is suitable for audio generation.

Special discussion with Tim Scarfe too! Thanks for coming! Support his podcast, Machine Learning Street Talk for more discussion on ML and AI advances: https://www.youtube.com/c/MachineLearningStreetTalk

Paper: https://valle-demo.github.io/

Related Papers (Using Neural Encoders and Decoders for Audio Encoding/Decoding - Neural Audio Codecs):
Encodec: https://arxiv.org/abs/2210.13438
Soundstream (first architecture to use Residual Vector Quantization (RVQ)): https://arxiv.org/abs/2107.03312

VQ-VAE (More elaboration on Vector Quantization): https://arxiv.org/pdf/1711.00937.pdf

Processing in time domain:
WaveNet: https://www.deepmind.com/blog/wavenet-a-generative-model-for-raw-audio
Wav2Vec: https://arxiv.org/pdf/1904.05862.pdf

~~~~

0:00 Introduction
3:54 Why it works
7:27 How to represent sound
20:30 Comparison between normal systems and VALL-E
22:07 Large Data
26:12 Data Representation
31:03 Fixed bias helps to speed up learning!
34:52 Discussion on Encodec
1:10:58 Is tokenisation in VALL-E good?
1:16:57 Can Transformers be used for any domain?
1:19:53 Various losses in Encodec
1:24:22 Is the Encodec doing part-whole hierarchy?
1:28:48 How to adapt VALL-E take in text prompts to condition speaker information?
1:31:25 Do language models understand?
1:37:11 Mel Spectrogram
1:49:06 Why is Mel Spectrogram still used in modern architectures?
1:52:14 Bias in Structure vs Loss Function

~~~~

AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.

Discord: https://discord.gg/fXCZCPYs
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/.
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin

Other Videos By John Tan Chong Min

2023-04-25	Learn from just Memory Storage and Retrieval: Generative Agents Interacting in Simulation!
2023-04-18	The future is neuro-symbolic: Expressiveness of ChatGPT and generalizability of symbols (SymbolicAI)
2023-04-17	Can GPT4 solve the Abstraction and Reasoning Corpus (ARC) Challenge Zero-Shot?
2023-04-12	GPT4: Zero-shot Classification without any examples + Fine-tune with reflection
2023-04-11	OpenAI Vector Embeddings - Talk to any book or document; Retrieval-Augmented Generation!
2023-04-11	Tutorial #2: OpenAI Vector Embeddings and Pinecone for Retrieval-Augmented Generation
2023-04-04	Creating JARVIS: ChatGPT + APIs - HuggingGPT, Memory-Augmented Context, Meta GPT structures
2023-04-02	Is GPT4 capable of self-improving? Are we heading for AGI or AI doom?
2023-03-28	How Visual ChatGPT works + Toolformer/Wolfram Alpha. LLMs with Tools/APIs/Plugins is the way ahead!
2023-03-21	Tokenize any input, even continuous vectors! - Residual Vector Quantization - VALL-E (Part 2)
2023-03-07	Using Transformers to mimic anyone's voice! - VALL-E (Part 1)
2023-02-28	Learning Part-Whole Structure by Chunking - More Efficient than Deep Learning!!!
2023-02-21	High-level planning with large language models - SayCan
2023-02-13	Learning, Fast and Slow: Towards Fast and Adaptable Agents in Changing Environments
2023-02-07	Using Logic Gates as Neurons - Deep Differentiable Logic Gate Networks!
2023-01-31	Learn from External Memory, not just Weights: Large-Scale Retrieval for Reinforcement Learning
2023-01-17	How ChatGPT works - From Transformers to Reinforcement Learning with Human Feedback (RLHF)
2023-01-09	HyperTree Proof Search - Automated Theorem Proving with AlphaZero and Transformers!
2022-12-23	CodinGame Fall Challenge 2022: A First Look (managed to get to Silver!)
2022-12-21	Can ChatGPT solve CodinGame/Google Kickstart problems?
2022-12-19	Reinforcement Learning Fast and Slow: Goal-Directed and Memory Retrieval Mechanism!

Channel	Latest
IOSTouchplayHD	6 hours ago
Northlight TV	6 hours ago
The Propagandacast	6 hours ago
SaGooDUp	6 hours ago
Mukimuk	6 hours ago
ZockArena	6 hours ago
Geek Out Fantasy	6 hours ago
SAT.1 REGIONAL	6 hours ago
Evsyukov Play	6 hours ago
99 GG	6 hours ago
Tvoy Igrovoy — channel about board games	6 hours ago
囲炉裏のまったりゲームCH	6 hours ago
Ghelloz	6 hours ago
SmashTom	6 hours ago
Nashara	6 hours ago
오늘의 코인뉴스	6 hours ago
Bladii	7 hours ago
よしなま	7 hours ago
悟ったハーランド【サッカーみんなの反応】	7 hours ago
HDblog	7 hours ago
Gotagx	7 hours ago
遊戲狂人	7 hours ago
哈奇Hachi	7 hours ago
Koga Kocheng Gamer	7 hours ago
Martini Pictures	7 hours ago