Symbolic Knowledge Distillation: from General Language Models to Commonsense Models (Explained)

Channel:

Yannic Kilcher

Subscribers:

291,000

Published on October 24, 2021 8:59:32 PM ● Video Link: https://www.youtube.com/watch?v=kP-dXK9JEhY

Duration: 45:21

24,283 views

#gpt3 #knowledge #symbolic

Symbolic knowledge models are usually trained on human-generated corpora that are cumbersome and expensive to create. Such corpora consist of structured triples of symbolic knowledge. This paper takes a different approach and attempts to generate such a corpus by prompting GPT-3. Results show that clever prompting, combined with targeted small critic models trained on human ratings can outperform both human-generated data, as well as the teacher model (GPT-3) itself. The results of this paper give a general recipe for automatically building corpora for various NLP tasks by extracting samples from large language models.

OUTLINE:
0:00 - Intro & Overview
2:30 - Sponsor: Weights & Biases
4:15 - Commonsense Knowledge Graphs
7:50 - ATOMIC dataset
10:00 - Generating the corpus from a model
13:00 - Prompting GPT-3
15:30 - Generating Events
18:40 - Generating Inferences
23:00 - Evaluating the created dataset
26:45 - Introducing the critic
31:25 - Using the critic to filter the data
36:30 - Training a student on the generated data
41:00 - Key Findings
44:45 - Comments & Conclusion

Paper: https://arxiv.org/abs/2110.07178
Code & Corpus: https://github.com/peterwestai2/symbolic-knowledge-distillation

Sponsor: Weights & Biases
https://wandb.com
https://community.wandb.ai/

Abstract:
The common practice for training commonsense models has gone from-human-to-corpus-to-machine: humans author commonsense knowledge graphs in order to train commonsense models. In this work, we investigate an alternative, from-machine-to-corpus-to-machine: general language models author these commonsense knowledge graphs to train commonsense models. Our study leads to a new framework, Symbolic Knowledge Distillation. As with prior art in Knowledge Distillation (Hinton et al., 2015), our approach uses larger models to teach smaller models. A key difference is that we distill knowledge symbolically-as text-in addition to the neural model. We also distill only one aspect-the commonsense of a general language model teacher, allowing the student to be a different type, a commonsense model. Altogether, we show that careful prompt engineering and a separately trained critic model allow us to selectively distill high-quality causal commonsense from GPT-3, a general language model. Empirical results demonstrate that, for the first time, a human-authored commonsense knowledge graph is surpassed by our automatically distilled variant in all three criteria: quantity, quality, and diversity. In addition, it results in a neural commonsense model that surpasses the teacher model's commonsense capabilities despite its 100x smaller size. We apply this to the ATOMIC resource, and share our new symbolic knowledge graph and commonsense models.

Authors: Peter West, Chandra Bhagavatula, Jack Hessel, Jena D. Hwang, Liwei Jiang, Ronan Le Bras, Ximing Lu, Sean Welleck, Yejin Choi

Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/ykilcher
BiliBili: https://space.bilibili.com/1824646584

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2021-11-18	[ML News] Cedille French Language Model \| YOU Search Engine \| AI Finds Profitable MEME TOKENS
2021-11-15	Gradients are Not All You Need (Machine Learning Research Paper Explained)
2021-11-12	[ML News] Microsoft combines Images & Text \| Meta makes artificial skin \| Russians replicate DALL-E
2021-11-10	Autoregressive Diffusion Models (Machine Learning Research Paper Explained)
2021-11-05	[ML News] Google introduces Pathways \| OpenAI solves Math Problems \| Meta goes First Person
2021-11-03	EfficientZero: Mastering Atari Games with Limited Data (Machine Learning Research Paper Explained)
2021-10-31	[YTalks] Siraj Raval - Stories about YouTube, Plagiarism, and the Dangers of Fame (Interview)
2021-10-29	[ML News] NVIDIA GTC'21 \| DeepMind buys MuJoCo \| Google predicts spreadsheet formulas
2021-10-29	[ML News GERMAN] NVIDIA GTC'21 \| DeepMind kauft MuJoCo \| Google Lernt Spreadsheet Formeln
2021-10-27	I went to an AI Art Festival in Geneva (AiiA Festival Trip Report)
2021-10-24	Symbolic Knowledge Distillation: from General Language Models to Commonsense Models (Explained)
2021-10-21	I took a Swiss train and it was awesome! Train Seat Review - SBB InterCity 1 - Geneva to St. Gallen
2021-10-20	[ML News] Microsoft trains 530B model \| ConvMixer model fits into single tweet \| DeepMind profitable
2021-10-07	[ML News] DeepMind does Nowcasting \| The Guardian's shady reporting \| AI finishes Beethoven's 10th
2021-10-06	Grokking: Generalization beyond Overfitting on small algorithmic datasets (Paper Explained)
2021-10-02	How far can we scale up? Deep Learning's Diminishing Returns (Article Review)
2021-09-29	[ML News] Plagiarism Case w/ Plot Twist \| CLIP for video surveillance \| OpenAI summarizes books
2021-09-27	Inconsistency in Conference Peer Review: Revisiting the 2014 NeurIPS Experiment (Paper Explained)
2021-09-26	100K Subs AMA (Ask Me Anything)
2021-09-24	[ML News] New ImageNet SOTA \| Uber's H3 hexagonal coordinate system \| New text-image-pair dataset
2021-09-21	Does GPT-3 lie? - Misinformation and fear-mongering around the TruthfulQA dataset

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

gpt-3

knowledge distillation

teacher

student

nlp

natural language processing

gpt3

prompt engineering

symbolic knowledge

symbolic reasoning

symbolic nlp

knowledge graphs

triples

what does gpt-3 know

does gpt-3 understand

Channel	Latest
Sukru Kagnici	6 hours ago
Haki	6 hours ago
Inspiration House Network	6 hours ago
Claaaaash	6 hours ago
ALLANDRODesk	6 hours ago
Killersopla	7 hours ago
Access4All	7 hours ago
RICOLIVES116	7 hours ago
Renegades React	7 hours ago
BlueFeral	7 hours ago
More Trick2g	7 hours ago
Romanian TVee	7 hours ago
fwengli	7 hours ago
Josh Loves The Mic	7 hours ago
Midstake	7 hours ago
Reaper Gaming	7 hours ago
FoxWaifu	7 hours ago
Kprorus	7 hours ago
Nerd Dendê	7 hours ago
Ace101Infinity	7 hours ago
Stephen White	7 hours ago
LIA MENDI	7 hours ago
MasterG Productions	7 hours ago
Luis Xita	7 hours ago
Exitosa Noticias	7 hours ago