[Transformer] Attention Is All You Need | AISC Foundational

Published on ● Video Link: https://www.youtube.com/watch?v=S0KakHcj_rs



Duration: 54:13
31,552 views
687


22 October 2018

For slides and more information, visit https://aisc.ai.science/events/2018-10-22

Paper: https://arxiv.org/abs/1706.03762

Speaker: Joseph Palermo (Dessa)

Host: Insight
Date: Oct 22nd, 2018

Attention Is All You Need

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.




Other Videos By LLMs Explained - Aggregate Intellect - AI.SCIENCE


2018-11-30Visualizing Data using t-SNE (algorithm) | AISC Foundational
2018-11-30Visualizing Data using t-SNE (discussions) | AISC Foundational
2018-11-27[BERT] Pretranied Deep Bidirectional Transformers for Language Understanding (discussions) | TDLS
2018-11-27[BERT] Pretranied Deep Bidirectional Transformers for Language Understanding (algorithm) | TDLS
2018-11-27Neural Image Caption Generation with Visual Attention (algorithm) | AISC
2018-11-27Neural Image Caption Generation with Visual Attention (discussion) | AISC
2018-11-17PGGAN | Progressive Growing of GANs for Improved Quality, Stability, and Variation (part 2) | AISC
2018-11-16PGGAN | Progressive Growing of GANs for Improved Quality, Stability, and Variation (part 1) | AISC
2018-11-16(Original Paper) Latent Dirichlet Allocation (discussions) | AISC Foundational
2018-11-15(Original Paper) Latent Dirichlet Allocation (algorithm) | AISC Foundational
2018-10-31[Transformer] Attention Is All You Need | AISC Foundational
2018-10-25[Original attention] Neural Machine Translation by Jointly Learning to Align and Translate | AISC
2018-10-16[StackGAN++] Realistic Image Synthesis with Stacked Generative Adversarial Networks | AISC
2018-10-11Bayesian Deep Learning on a Quantum Computer | TDLS Author Speaking
2018-10-02Prediction of Cardiac arrest from physiological signals in the pediatric ICU | TDLS Author Speaking
2018-09-24Junction Tree Variational Autoencoder for Molecular Graph Generation | TDLS
2018-09-19Reconstructing quantum states with generative models | TDLS Author Speaking
2018-09-13All-optical machine learning using diffractive deep neural networks | TDLS
2018-09-05Recurrent Models of Visual Attention | TDLS
2018-08-28Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates | TDLS
2018-08-20TDLS: Large-Scale Unsupervised Deep Representation Learning for Brain Structure



Tags:
nlp
natural language processing
sequence to sequence models
neural attention
attention is all you need
deep learning
machine learning
ai
artificial intelligence
transformer model
self attention
attention
transformer deep learning
transformer network