Blockwise Parallel Decoding for Deep Autoregressive Models

Channel:

Yannic Kilcher

Subscribers:

301,000

Published on May 6, 2019 9:46:52 AM ● Video Link: https://www.youtube.com/watch?v=3Tqp_B2G6u0

Duration: 23:52

693 views

https://arxiv.org/abs/1811.03115

Abstract:
Deep autoregressive sequence-to-sequence models have demonstrated impressive performance across a wide variety of tasks in recent years. While common architecture classes such as recurrent, convolutional, and self-attention networks make different trade-offs between the amount of computation needed per layer and the length of the critical path at training time, generation still remains an inherently sequential process. To overcome this limitation, we propose a novel blockwise parallel decoding scheme in which we make predictions for multiple time steps in parallel then back off to the longest prefix validated by a scoring model. This allows for substantial theoretical improvements in generation speed when applied to architectures that can process output sequences in parallel. We verify our approach empirically through a series of experiments using state-of-the-art self-attention models for machine translation and image super-resolution, achieving iteration reductions of up to 2x over a baseline greedy decoder with no loss in quality, or up to 7x in exchange for a slight decrease in performance. In terms of wall-clock time, our fastest models exhibit real-time speedups of up to 4x over standard greedy decoding.

Authors: Mitchell Stern, Noam Shazeer, Jakob Uszkoreit

Other Videos By Yannic Kilcher

2019-08-08	Learning World Graphs to Accelerate Hierarchical Reinforcement Learning
2019-08-05	Reconciling modern machine learning and the bias-variance trade-off
2019-07-05	Conversation about Population-Based Methods (Re-upload)
2019-07-03	XLNet: Generalized Autoregressive Pretraining for Language Understanding
2019-06-13	Talking to companies at ICML19
2019-06-12	Population-Based Search and Open-Ended Algorithms
2019-06-10	I'm at ICML19 :)
2019-05-14	Adversarial Examples Are Not Bugs, They Are Features
2019-05-10	Reinforcement Learning, Fast and Slow
2019-05-09	S.H.E. - Search. Human. Equalizer.
2019-05-06	Blockwise Parallel Decoding for Deep Autoregressive Models
2019-04-27	Discriminating Systems - Gender, Race, and Power in AI
2019-02-19	The Odds are Odd: A Statistical Test for Detecting Adversarial Examples
2019-02-18	Neural Ordinary Differential Equations
2019-02-18	GPT-2: Language Models are Unsupervised Multitask Learners
2019-02-02	Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
2019-01-30	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019-01-09	What’s in a name? The need to nip NIPS
2018-12-21	Stochastic RNNs without Teacher-Forcing
2018-12-18	Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations
2018-04-07	World Models

Tags:

machine learning

deep learning

transformers

nlp

natural language processing

artificial intelligence

google brain

autoregressive

greedy decoding

inference

language model

speedup

Channel	Latest
けい	9 hours ago
The Silly Steve Show	10 hours ago
shingokick	10 hours ago
YG tech	10 hours ago
MrPhoenixFR1	10 hours ago
Best Gaming Guide	10 hours ago
CaptainFRACAS	10 hours ago
R-TAC & Daughters	10 hours ago
ThatGuyBob	10 hours ago
Tech & Co	10 hours ago
Kitab Gaming	10 hours ago
Papi Corse	11 hours ago
ゴルベスタモチリー	11 hours ago
JURBEX by DELIO THE POSTMAN	11 hours ago
LA CAUSA	11 hours ago
KrasavecGames	11 hours ago
Ludwig	11 hours ago
РыбаКит	11 hours ago
UWONG XGAMING ID	11 hours ago
TeroserPlay	11 hours ago
TueurDeBikette	11 hours ago
arkadia	11 hours ago
DatGamerDude	11 hours ago
Captain Oats	11 hours ago
#しにせ/shinise	11 hours ago