Sparse Expert Models (Switch Transformers, GLAM, and more... w/ the Authors)

Channel:

Yannic Kilcher

Subscribers:

300,000

Published on April 21, 2022 8:40:35 AM ● Video Link: https://www.youtube.com/watch?v=ccBMRryxGog

Duration: 58:23

11,065 views

302

#nlp #sparsity #transformers

This video is an interview with Barret Zoph and William Fedus of Google Brain about Sparse Expert Models.
Sparse Expert models have been hugely successful at distributing parts of models, mostly Transformers, across large array of machines and use a routing function to effectively route signals between them. This means that even though these models have a huge number of parameters, the computational load for a given signal does not increase because the model is only sparsely activated. Sparse expert models, such as Switch Transformers and GLAM can scale up to trillions of parameters and bring a number of desirable properties. We discuss everything from the fundamentals, history, strengths and weaknesses, up to the current state of the art of these models.

OUTLINE:
0:00 - Intro
0:30 - What are sparse expert models?
4:25 - Start of Interview
5:55 - What do you mean by sparse experts?
8:10 - How does routing work in these models?
12:10 - What is the history of sparse experts?
14:45 - What does an individual expert learn?
19:25 - When are these models appropriate?
22:30 - How comparable are sparse to dense models?
26:30 - How does the pathways system connect to this?
28:45 - What improvements did GLAM make?
31:30 - The "designing sparse experts" paper
37:45 - Can experts be frozen during training?
41:20 - Can the routing function be improved?
47:15 - Can experts be distributed beyond data centers?
50:20 - Are there sparse experts for other domains than NLP?
52:15 - Are sparse and dense models in competition?
53:35 - Where do we go from here?
56:30 - How can people get started with this?

Papers:
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (https://arxiv.org/abs/2101.03961)
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts (https://arxiv.org/abs/2112.06905)
Designing Effective Sparse Expert Models (https://arxiv.org/abs/2202.08906)

Links:
Merch: http://store.ykilcher.com
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://ykilcher.com/discord
BitChute: https://www.bitchute.com/channel/yannic-kilcher
LinkedIn: https://www.linkedin.com/in/ykilcher
BiliBili: https://space.bilibili.com/2017636191

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2022-06-03	GPT-4chan: This is the worst AI ever
2022-06-01	Did I crash the NFT market?
2022-05-13	[ML News] DeepMind's Flamingo Image-Text model \| Locked-Image Tuning \| Jurassic X & MRKL
2022-05-10	[ML News] Meta's OPT 175B language model \| DALL-E Mega is training \| TorToiSe TTS fakes my voice
2022-05-05	This A.I. creates infinite NFTs
2022-05-02	Author Interview: SayCan - Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
2022-04-30	Do As I Can, Not As I Say: Grounding Language in Robotic Affordances (SayCan - Paper Explained)
2022-04-26	Author Interview - ACCEL: Evolving Curricula with Regret-Based Environment Design
2022-04-25	ACCEL: Evolving Curricula with Regret-Based Environment Design (Paper Review)
2022-04-22	LAION-5B: 5 billion image-text-pairs dataset (with the authors)
2022-04-21	Sparse Expert Models (Switch Transformers, GLAM, and more... w/ the Authors)
2022-04-17	Author Interview - Transformer Memory as a Differentiable Search Index
2022-04-16	Transformer Memory as a Differentiable Search Index (Machine Learning Research Paper Explained)
2022-04-10	[ML News] Google's 540B PaLM Language Model & OpenAI's DALL-E 2 Text-to-Image Revolution
2022-04-06	DALL-E 2 by OpenAI is out! Live Reaction
2022-04-04	The Weird and Wonderful World of AI Art (w/ Author Jack Morris)
2022-04-02	Author Interview - Improving Intrinsic Exploration with Language Abstractions
2022-04-01	Improving Intrinsic Exploration with Language Abstractions (Machine Learning Paper Explained)
2022-03-30	[ML News] GPT-3 learns to edit \| Google Pathways \| Make-A-Scene \| CLIP meets GamePhysics \| DouBlind
2022-03-29	Author Interview - Memory-assisted prompt editing to improve GPT-3 after deployment
2022-03-28	Memory-assisted prompt editing to improve GPT-3 after deployment (Machine Learning Paper Explained)

Channel	Latest
VAKA	6 hours ago
Mung Andom	6 hours ago
Miiguelpin	6 hours ago
Drefecer	6 hours ago
Zilévo	7 hours ago
Trí Beat	7 hours ago
Oma Rohaeti	7 hours ago
Fellas Yt	7 hours ago
BANG DIN	7 hours ago
Shinejonzoie	7 hours ago
Levi	7 hours ago
Tregov	7 hours ago
yamato Studio	7 hours ago
AntiQuaz	7 hours ago
阿斯asu	7 hours ago
T1 Jaguar	8 hours ago
Edwin perdana	8 hours ago
AMHarbinger	8 hours ago
Boss Botchog TV	8 hours ago
Lylia Chan	8 hours ago
Angga Mzrbest	8 hours ago
Potter Gaming	8 hours ago
FAKTA UNIK	8 hours ago
Aqil Zulkiflee	8 hours ago
Nanda Hero	8 hours ago