RoBERTa: A Robustly Optimized BERT Pretraining Approach

Channel:

Yannic Kilcher

Subscribers:

291,000

Published on September 3, 2019 3:05:28 PM ● Video Link: https://www.youtube.com/watch?v=-MCYbmU9kfg

Duration: 19:15

20,735 views

797

This paper shows that the original BERT model, if trained correctly, can outperform all of the improvements that have been proposed lately, raising questions about the necessity and reasoning behind these.

Abstract:
Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These results highlight the importance of previously overlooked design choices, and raise questions about the source of recently reported improvements. We release our models and code.

Authors: Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov

https://arxiv.org/abs/1907.11692

YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Minds: https://www.minds.com/ykilcher
BitChute: https://www.bitchute.com/channel/10a5ui845DOJ/

Other Videos By Yannic Kilcher

2019-11-07	A neurally plausible model learns successor representations in partially observable environments
2019-11-03	SinGAN: Learning a Generative Model from a Single Natural Image
2019-11-02	AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning
2019-11-01	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
2019-10-31	The Visual Task Adaptation Benchmark
2019-10-15	LeDeepChef 👨‍🍳 Deep Reinforcement Learning Agent for Families of Text-Based Games
2019-10-14	[News] The Siraj Raval Controversy
2019-10-07	Accelerating Deep Learning by Focusing on the Biggest Losers
2019-09-05	DEEP LEARNING MEME REVIEW - Episode 1
2019-09-04	Dynamic Routing Between Capsules
2019-09-03	RoBERTa: A Robustly Optimized BERT Pretraining Approach
2019-08-28	Auditing Radicalization Pathways on YouTube
2019-08-13	Gauge Equivariant Convolutional Networks and the Icosahedral CNN
2019-08-12	Processing Megapixel Images with Deep Attention-Sampling Models
2019-08-09	Manifold Mixup: Better Representations by Interpolating Hidden States
2019-08-08	Learning World Graphs to Accelerate Hierarchical Reinforcement Learning
2019-08-05	Reconciling modern machine learning and the bias-variance trade-off
2019-07-05	Conversation about Population-Based Methods (Re-upload)
2019-07-03	XLNet: Generalized Autoregressive Pretraining for Language Understanding
2019-06-13	Talking to companies at ICML19
2019-06-12	Population-Based Search and Open-Ended Algorithms

Tags:

deep learning

machine learning

nlp

natural language processing

machine translation

arxiv

google

attention mechanism

attention

transformer

tensor2tensor

rnn

recurrent

seq2seq

bert

unsupervised

squad

wordpiece

embeddings

language

language modeling

attention layers

bidirectional

elmo

word vectors

pretrained

fine tuning

Channel	Latest
Sey Senpai	11 hours ago
Vardoc1	13 hours ago
Anton Petrov	13 hours ago
Many A True Nerd	14 hours ago
LInk02	14 hours ago
Mon Facts	15 hours ago
GeorgeMallouris	15 hours ago
Big punchman	16 hours ago
Jakou	16 hours ago
HOWTONEVOLUTION	16 hours ago
Brunoborne	16 hours ago
Goodblue77	16 hours ago
lugeyps3	16 hours ago
Stan's Mod Gaming	16 hours ago
OPEN TV	16 hours ago
neXzen MMD & MUSIC	16 hours ago
flipswitch3111	16 hours ago
WalkthroughGuy	16 hours ago
ТРЕНДИ ШОРТС	16 hours ago
eagLe34	16 hours ago
Melody /ميلودي	17 hours ago
Linkwolf	17 hours ago
아루우	17 hours ago
Nostradamus	17 hours ago
Xeres Artrophel Ch.	17 hours ago