Author Interview - Typical Decoding for Natural Language Generation

Channel:

Yannic Kilcher

Subscribers:

300,000

Published on March 26, 2022 3:59:28 PM ● Video Link: https://www.youtube.com/watch?v=AvHLJqtmQkE

Duration: 48:56

8,635 views

266

#deeplearning #nlp #sampling

This is an interview with first author Clara Meister.
Paper review video hereé https://youtu.be/_EDr3ryrT_Y

Modern language models like T5 or GPT-3 achieve remarkably low perplexities on both training and validation data, yet when sampling from their output distributions, the generated text often seems dull and uninteresting. Various workarounds have been proposed, such as top-k sampling and nucleus sampling, but while these manage to somewhat improve the generated samples, they are hacky and unfounded. This paper introduces typical sampling, a new decoding method that is principled, effective, and can be implemented efficiently. Typical sampling turns away from sampling purely based on likelihood and explicitly finds a trade-off between generating high-probability samples and generating high-information samples. The paper connects typical sampling to psycholinguistic theories on human speech generation, and shows experimentally that typical sampling achieves much more diverse and interesting results than any of the current methods.

Sponsor: Introduction to Graph Neural Networks Course
https://www.graphneuralnets.com/p/introduction-to-gnns?coupon_code=SUNGLASSES&affcode=999036_lzknae-d

OUTLINE:
0:00 - Intro
0:35 - Sponsor: Introduction to GNNs Course (link in description)
1:30 - Why does sampling matter?
5:40 - What is a "typical" message?
8:35 - How do humans communicate?
10:25 - Why don't we just sample from the model's distribution?
15:30 - What happens if we condition on the information to transmit?
17:35 - Does typical sampling really represent human outputs?
20:55 - What do the plots mean?
31:00 - Diving into the experimental results
39:15 - Are our training objectives wrong?
41:30 - Comparing typical sampling to top-k and nucleus sampling
44:50 - Explaining arbitrary engineering choices
47:20 - How can people get started with this?

Paper: https://arxiv.org/abs/2202.00666
Code: https://github.com/cimeister/typical-sampling/blob/3e676cfd88fa2e6a24f2bdc6f9f07fddb87827c2/src/transformers/generation_logits_process.py#L242-L272

Abstract:
Despite achieving incredibly low perplexities on myriad natural language corpora, today's language models still often underperform when used to generate text. This dichotomy has puzzled the language generation community for the last few years. In this work, we posit that the abstraction of natural language as a communication channel (à la Shannon, 1948) can provide new insights into the behaviors of probabilistic language generators, e.g., why high-probability texts can be dull or repetitive. Humans use language as a means of communicating information, and do so in a simultaneously efficient and error-minimizing manner; they choose each word in a string with this (perhaps subconscious) goal in mind. We propose that generation from probabilistic models should mimic this behavior. Rather than always choosing words from the high-probability region of the distribution--which have a low Shannon information content--we sample from the set of words with information content close to the conditional entropy of our model, i.e., close to the expected information content. This decision criterion can be realized through a simple and efficient implementation, which we call typical sampling. Automatic and human evaluations show that, in comparison to nucleus and top-k sampling, typical sampling offers competitive performance in terms of quality while consistently reducing the number of degenerate repetitions.

Authors: Clara Meister, Tiago Pimentel, Gian Wiher, Ryan Cotterell

Links:
Merch: http://store.ykilcher.com
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://ykilcher.com/discord
BitChute: https://www.bitchute.com/channel/yannic-kilcher
LinkedIn: https://www.linkedin.com/in/ykilcher
BiliBili: https://space.bilibili.com/2017636191

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2022-04-17	Author Interview - Transformer Memory as a Differentiable Search Index
2022-04-16	Transformer Memory as a Differentiable Search Index (Machine Learning Research Paper Explained)
2022-04-10	[ML News] Google's 540B PaLM Language Model & OpenAI's DALL-E 2 Text-to-Image Revolution
2022-04-06	DALL-E 2 by OpenAI is out! Live Reaction
2022-04-04	The Weird and Wonderful World of AI Art (w/ Author Jack Morris)
2022-04-02	Author Interview - Improving Intrinsic Exploration with Language Abstractions
2022-04-01	Improving Intrinsic Exploration with Language Abstractions (Machine Learning Paper Explained)
2022-03-30	[ML News] GPT-3 learns to edit \| Google Pathways \| Make-A-Scene \| CLIP meets GamePhysics \| DouBlind
2022-03-29	Author Interview - Memory-assisted prompt editing to improve GPT-3 after deployment
2022-03-28	Memory-assisted prompt editing to improve GPT-3 after deployment (Machine Learning Paper Explained)
2022-03-26	Author Interview - Typical Decoding for Natural Language Generation
2022-03-25	Typical Decoding for Natural Language Generation (Get more human-like outputs from language models!)
2022-03-24	One Model For All The Tasks - BLIP (Author Interview)
2022-03-23	BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding&Generation
2022-03-21	[ML News] AI Threatens Biological Arms Race
2022-03-20	Active Dendrites avoid catastrophic forgetting - Interview with the Authors
2022-03-18	Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments (Review)
2022-03-14	Author Interview - VOS: Learning What You Don't Know by Virtual Outlier Synthesis
2022-03-13	VOS: Learning What You Don't Know by Virtual Outlier Synthesis (Paper Explained)
2022-03-08	Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents
2022-03-06	First Author Interview: AI & formal math (Formal Mathematics Statement Curriculum Learning)

Channel	Latest
HeroVoltsy	6 hours ago
SiIvaGunner	6 hours ago
Happy Animes Recaps	7 hours ago
Thresh Challenger	8 hours ago
iTownGamePlay Terror&Diversión	8 hours ago
GuitarHeroStyles	9 hours ago
Top5Gaming	10 hours ago
MrDalekJD	10 hours ago
LongplayArchive	10 hours ago
Mrs Chim Chim	10 hours ago
gameranx	11 hours ago
Olexa	11 hours ago
Typical Gamer	11 hours ago
JustSaySteven	12 hours ago
dakblake	12 hours ago
TG Plays	12 hours ago
FIR3IV	12 hours ago
Projeto Yukaa	12 hours ago
WildGamerSK	13 hours ago
Markiplier	13 hours ago
Lex Play	13 hours ago
domisumReplay: Swain	13 hours ago
Fz Frost	13 hours ago
RobtheMod	13 hours ago
MrT-Gaming	14 hours ago