GPT-3: Language Models are Few-Shot Learners (Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

301,000

Published on May 29, 2020 3:13:36 PM ● Video Link: https://www.youtube.com/watch?v=SY5PvZrJhLE

Duration: 1:04:30

206,575 views

5,662

#gpt3 #openai #gpt-3

How far can you go with ONLY language modeling? Can a large enough language model perform NLP task out of the box? OpenAI take on these and other questions by training a transformer that is an order of magnitude larger than anything that has ever been built before and the results are astounding.

OUTLINE:
0:00 - Intro & Overview
1:20 - Language Models
2:45 - Language Modeling Datasets
3:20 - Model Size
5:35 - Transformer Models
7:25 - Fine Tuning
10:15 - In-Context Learning
17:15 - Start of Experimental Results
19:10 - Question Answering
23:10 - What I think is happening
28:50 - Translation
31:30 - Winograd Schemes
33:00 - Commonsense Reasoning
37:00 - Reading Comprehension
37:30 - SuperGLUE
40:40 - NLI
41:40 - Arithmetic Expressions
48:30 - Word Unscrambling
50:30 - SAT Analogies
52:10 - News Article Generation
58:10 - Made-up Words
1:01:10 - Training Set Contamination
1:03:10 - Task Examples

https://arxiv.org/abs/2005.14165
https://github.com/openai/gpt-3

Abstract:
Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

Authors: Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher

Other Videos By Yannic Kilcher

2020-06-08	JOIN ME for the NeurIPS 2020 Flatland Multi-Agent RL Challenge!
2020-06-07	BLEURT: Learning Robust Metrics for Text Generation (Paper Explained)
2020-06-06	Synthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search (Paper Explained)
2020-06-05	CornerNet: Detecting Objects as Paired Keypoints (Paper Explained)
2020-06-04	Movement Pruning: Adaptive Sparsity by Fine-Tuning (Paper Explained)
2020-06-03	Learning To Classify Images Without Labels (Paper Explained)
2020-06-02	On the Measure of Intelligence by François Chollet - Part 1: Foundations (Paper Explained)
2020-06-01	Dynamics-Aware Unsupervised Discovery of Skills (Paper Explained)
2020-05-31	Synthesizer: Rethinking Self-Attention in Transformer Models (Paper Explained)
2020-05-30	[Code] How to use Facebook's DETR object detection algorithm in Python (Full Tutorial)
2020-05-29	GPT-3: Language Models are Few-Shot Learners (Paper Explained)
2020-05-28	DETR: End-to-End Object Detection with Transformers (Paper Explained)
2020-05-27	mixup: Beyond Empirical Risk Minimization (Paper Explained)
2020-05-26	A critical analysis of self-supervision, or what we can learn from a single image (Paper Explained)
2020-05-25	Deep image reconstruction from human brain activity (Paper Explained)
2020-05-24	Regularizing Trajectory Optimization with Denoising Autoencoders (Paper Explained)
2020-05-23	[News] The NeurIPS Broader Impact Statement
2020-05-22	When BERT Plays the Lottery, All Tickets Are Winning (Paper Explained)
2020-05-21	[News] OpenAI Model Generates Python Code
2020-05-20	Investigating Human Priors for Playing Video Games (Paper & Demo)
2020-05-19	iMAML: Meta-Learning with Implicit Gradients (Paper Explained)

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

transformers

attention

nlp

natural language processing

gpt3

gpt-3

gpt2

gpt-2

openai

language model

mlm

autoregressive

heads

bert

turing

microsoft

question answering

news

glue

superglue

sota

preplexity

corpus

common crawl

wikipedia

natural questions

boolq

math

strings

context

deep language

zero shot

few shot

training data

Channel	Latest
LNDinizGames	6 hours ago
Advancer64	6 hours ago
WoodwardSports	6 hours ago
kalhamm3r	6 hours ago
Call Of Duty League Clips	6 hours ago
Alexmania	6 hours ago
Nutshell Animations	7 hours ago
Angel Sterling Lomelí	7 hours ago
Clueless Players	7 hours ago
ChaosMoogle	7 hours ago
Jotinha	7 hours ago
Zakkhaios	7 hours ago
VALDY MOBILE	7 hours ago
Ian Rastall	7 hours ago
MDTechVideos	7 hours ago
DavidTheBaum	7 hours ago
PC AND XBOX GAMES & SIMRACING FGWR	7 hours ago
Phước Phè Phỡn	7 hours ago
Reagindo	7 hours ago
Segment	7 hours ago
Rogue Mechanist	7 hours ago
Jaime Gabaldoni	7 hours ago
mamont	7 hours ago
Multiverse Travelers	7 hours ago
Stract	7 hours ago