Longformer: The Long-Document Transformer

Channel:

Yannic Kilcher

Subscribers:

291,000

Published on April 20, 2020 2:07:56 PM ● Video Link: https://www.youtube.com/watch?v=_8KNb5iqblE

Duration: 26:36

17,981 views

666

The Longformer extends the Transformer by introducing sliding window attention and sparse global attention. This allows for the processing of much longer documents than classic models like BERT.

Paper: https://arxiv.org/abs/2004.05150
Code: https://github.com/allenai/longformer

Abstract:
Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer. Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local windowed attention with a task motivated global attention. Following prior work on long-sequence transformers, we evaluate Longformer on character-level language modeling and achieve state-of-the-art results on text8 and enwik8. In contrast to most prior work, we also pretrain Longformer and finetune it on a variety of downstream tasks. Our pretrained Longformer consistently outperforms RoBERTa on long document tasks and sets new state-of-the-art results on WikiHop and TriviaQA.

Authors: Iz Beltagy, Matthew E. Peters, Arman Cohan

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher

Other Videos By Yannic Kilcher

2020-04-30	The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies (Paper Explained)
2020-04-29	Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask (Paper Explained)
2020-04-28	[Rant] Online Conferences
2020-04-27	Do ImageNet Classifiers Generalize to ImageNet? (Paper Explained)
2020-04-26	[Drama] Schmidhuber: Critique of Honda Prize for Dr. Hinton
2020-04-25	How much memory does Longformer use?
2020-04-24	Supervised Contrastive Learning
2020-04-23	Thinking While Moving: Deep Reinforcement Learning with Concurrent Control
2020-04-22	[Rant] The Male Only History of Deep Learning
2020-04-21	Gradient Surgery for Multi-Task Learning
2020-04-20	Longformer: The Long-Document Transformer
2020-04-20	Backpropagation and the brain
2020-04-18	Shortcut Learning in Deep Neural Networks
2020-04-17	Feature Visualization & The OpenAI microscope
2020-04-16	Datasets for Data-Driven Reinforcement Learning
2020-04-15	FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
2020-04-14	Imputer: Sequence Modelling via Imputation and Dynamic Programming
2020-04-13	The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
2020-04-12	Dynamical Distance Learning for Semi-Supervised and Unsupervised Skill Discovery
2020-04-11	CURL: Contrastive Unsupervised Representations for Reinforcement Learning
2020-04-10	Enhanced POET: Open-Ended RL through Unbounded Invention of Learning Challenges and their Solutions

Tags:

deep learning

machine learning

nlp

natural language processing

machine translation

arxiv

attention mechanism

attention

transformer

bert

roberta

mlm

convolution

memory

linear

sliding

dilated

sparse

Channel	Latest
Decretum Livestreams	6 hours ago
Yodec	6 hours ago
Серега KORGE games	6 hours ago
Midget_Man_HQ	6 hours ago
MIDIFILES COM	6 hours ago
PRATA GAMER SEM LIMITES	6 hours ago
Phoenix 2002 Official Channel	6 hours ago
TheWizWiki	6 hours ago
Dragonblk Gamer	6 hours ago
alegodzilla	6 hours ago
BDF018	6 hours ago
Le Monde de Dragthor	6 hours ago
GilBroz Gaming	6 hours ago
PaludaGameplays	6 hours ago
DragonKirby	6 hours ago
ItsJah	6 hours ago
Warlich Gaming	6 hours ago
Simplicio 92	6 hours ago
Abkarino \| عبقرينو	6 hours ago
El Presente Tech	7 hours ago
Apolloxq1	7 hours ago
BLINDING HOPE	7 hours ago
tanrox	7 hours ago
Frau Schnurrhaar	7 hours ago
JHALLBALLER0	7 hours ago