Code Review: Transformer - Attention Is All You Need | AISC

Published on ● Video Link: https://www.youtube.com/watch?v=KMY2Knr4iAs



Category:
Review
Duration: 1:44:14
11,527 views
244


A.I. Socratic Circles, 4-Feb-2019
https://aisc.a-i.science/events/2019-02-04

Discussion Panel: Xiyang Chen, Felipe Perez, Ehsan Amjadian
Host: Paytm Labs

ATTENTION IS ALL YOU NEED

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.




Other Videos By LLMs Explained - Aggregate Intellect - AI.SCIENCE


2019-03-11[RecSys 2018 Challenge winner] Two-stage Model for Automatic Playlist Continuation at Scale |TDLS
2019-03-07[OpenAI GPT2] Language Models are Unsupervised Multitask Learners | TDLS Trending Paper
2019-03-04You May Not Need Attention | TDLS Code Review
2019-02-28[DDQN] Deep Reinforcement Learning with Double Q-learning | TDLS Foundational
2019-02-25[AlphaGo Zero] Mastering the game of Go without human knowledge | TDLS
2019-02-21Transformer XL | AISC Trending Papers
2019-02-19Computational prediction of diagnosis & feature selection on mesothelioma patient records | AISC
2019-02-18Support Vector Machine (original paper) | AISC Foundational
2019-02-11Tensor Field Networks | AISC
2019-02-07ACAI: Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer
2019-02-04Code Review: Transformer - Attention Is All You Need | AISC
2019-02-04[StyleGAN] A Style-Based Generator Architecture for GANs, part2 (results and discussion) | TDLS
2019-02-04[StyleGAN] A Style-Based Generator Architecture for GANs, part 1 (algorithm review) | TDLS
2019-02-04TDLS: Learning Functional Causal Models with GANs - part 1 (algorithm review)
2019-02-04TDLS: Learning Functional Causal Models with GANs - part 2 (results and discussion)
2019-02-04Neural Ordinary Differential Equations - part 1 (algorithm review) | AISC
2019-02-04Neural Ordinary Differential Equations - part 2 (results & discussion) | AISC
2019-02-04Parallel Collaborative Filtering for the Netflix Prize (algorithm review) | AISC Foundational
2019-02-04Parallel Collaborative Filtering for the Netflix Prize (results & discussion) AISC Foundational
2019-01-14TDLS - Announcing Fast Track Stream
2019-01-09Extracting Biologically Relevant Latent Space from Cancer Transcriptomes \w VAEs(discussions) I AISC



Tags:
transformer
deep learning
attention mechanism
natural language processing
attention
transformer attention