XLNet: Generalized Autoregressive Pretraining for Language Understanding

Channel:

Yannic Kilcher

Subscribers:

291,000

Published on July 3, 2019 10:51:44 AM ● Video Link: https://www.youtube.com/watch?v=H5vpBCLo74U

Duration: 30:06

20,818 views

654

Abstract:
With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, XLNet outperforms BERT on 20 tasks, often by a large margin, and achieves state-of-the-art results on 18 tasks including question answering, natural language inference, sentiment analysis, and document ranking.

Authors: Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le

https://arxiv.org/abs/1906.08237

Other Videos By Yannic Kilcher

2019-09-05	DEEP LEARNING MEME REVIEW - Episode 1
2019-09-04	Dynamic Routing Between Capsules
2019-09-03	RoBERTa: A Robustly Optimized BERT Pretraining Approach
2019-08-28	Auditing Radicalization Pathways on YouTube
2019-08-13	Gauge Equivariant Convolutional Networks and the Icosahedral CNN
2019-08-12	Processing Megapixel Images with Deep Attention-Sampling Models
2019-08-09	Manifold Mixup: Better Representations by Interpolating Hidden States
2019-08-08	Learning World Graphs to Accelerate Hierarchical Reinforcement Learning
2019-08-05	Reconciling modern machine learning and the bias-variance trade-off
2019-07-05	Conversation about Population-Based Methods (Re-upload)
2019-07-03	XLNet: Generalized Autoregressive Pretraining for Language Understanding
2019-06-13	Talking to companies at ICML19
2019-06-12	Population-Based Search and Open-Ended Algorithms
2019-06-10	I'm at ICML19 :)
2019-05-14	Adversarial Examples Are Not Bugs, They Are Features
2019-05-10	Reinforcement Learning, Fast and Slow
2019-05-09	S.H.E. - Search. Human. Equalizer.
2019-05-06	Blockwise Parallel Decoding for Deep Autoregressive Models
2019-04-27	Discriminating Systems - Gender, Race, and Power in AI
2019-02-19	The Odds are Odd: A Statistical Test for Detecting Adversarial Examples
2019-02-18	Neural Ordinary Differential Equations

Tags:

deep learning

machine learning

artificial intelligence

nlp

natural language processing

bert

xlnet

transformer

transformer xl

attention

attention layer

language model

language modeling

pretraining

autoregressive

autoencoder

permutation

google

carnegie mellon

cmu

state of the art

masked language model

Channel	Latest
Topia Gameplay	6 hours ago
Jimbo Gaming PlayStation 2	6 hours ago
NightFore	6 hours ago
Invogue Times	6 hours ago
ClickGameplay Mobile	6 hours ago
KrypticKills	6 hours ago
Joe	6 hours ago
apol	6 hours ago
zoom3000	6 hours ago
Comgaming	6 hours ago
Ears	7 hours ago
Tortilla Squad	7 hours ago
LSZULTAN Gaming Channel	7 hours ago
Grimpen Gaming	7 hours ago
TB InuYasha X	7 hours ago
SPORT BILD	7 hours ago
Qushu92	7 hours ago
Wara	7 hours ago
Five Five	7 hours ago
M E R C E D E S	7 hours ago
ขอบสนาม [Official]	7 hours ago
Bajheera	7 hours ago
For Glory!	7 hours ago
Jeane Genie	7 hours ago
Gaming Tornedo	7 hours ago