TransCoder: Unsupervised Translation of Programming Languages (Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

291,000

Published on June 9, 2020 2:07:27 PM ● Video Link: https://www.youtube.com/watch?v=xTzFJIknh7E

Duration: 48:38

145,093 views

2,016

Code migration between languages is an expensive and laborious task. To translate from one language to the other, one needs to be an expert at both. Current automatic tools often produce illegible and complicated code. This paper applies unsupervised neural machine translation to source code of Python, C++, and Java and is able to translate between them, without ever being trained in a supervised fashion.

OUTLINE:
0:00 - Intro & Overview
1:15 - The Transcompiling Problem
5:55 - Neural Machine Translation
8:45 - Unsupervised NMT
12:55 - Shared Embeddings via Token Overlap
20:45 - MLM Objective
25:30 - Denoising Objective
30:10 - Back-Translation Objective
33:00 - Evaluation Dataset
37:25 - Results
41:45 - Tokenization
42:40 - Shared Embeddings
43:30 - Human-Aware Translation
47:25 - Failure Cases
48:05 - Conclusion

Paper: https://arxiv.org/abs/2006.03511

Abstract:
A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a modern one. They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is timeconsuming and requires expertise in both the source and target languages, making code-translation projects expensive. Although neural models significantly outperform their rule-based counterparts in the context of natural language translation, their applications to transcompilation have been limited due to the scarcity of parallel data in this domain. In this paper, we propose to leverage recent approaches in unsupervised machine translation to train a fully unsupervised neural transcompiler. We train our model on source code from open source GitHub projects, and show that it can translate functions between C++, Java, and Python with high accuracy. Our method relies exclusively on monolingual source code, requires no expertise in the source or target languages, and can easily be generalized to other programming languages. We also build and release a test set composed of 852 parallel functions, along with unit tests to check the correctness of translations. We show that our model outperforms rule-based commercial baselines by a significant margin.

Authors: Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, Guillaume Lample

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher

Other Videos By Yannic Kilcher

2020-06-19	On the Measure of Intelligence by François Chollet - Part 2: Human Priors (Paper Explained)
2020-06-18	Image GPT: Generative Pretraining from Pixels (Paper Explained)
2020-06-17	BYOL: Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning (Paper Explained)
2020-06-16	TUNIT: Rethinking the Truly Unsupervised Image-to-Image Translation (Paper Explained)
2020-06-15	A bio-inspired bistable recurrent cell allows for long-lasting memory (Paper Explained)
2020-06-14	SynFlow: Pruning neural networks without any data by iteratively conserving synaptic flow
2020-06-13	Deep Differential System Stability - Learning advanced computations from examples (Paper Explained)
2020-06-12	VirTex: Learning Visual Representations from Textual Annotations (Paper Explained)
2020-06-11	Linformer: Self-Attention with Linear Complexity (Paper Explained)
2020-06-10	End-to-End Adversarial Text-to-Speech (Paper Explained)
2020-06-09	TransCoder: Unsupervised Translation of Programming Languages (Paper Explained)
2020-06-08	JOIN ME for the NeurIPS 2020 Flatland Multi-Agent RL Challenge!
2020-06-07	BLEURT: Learning Robust Metrics for Text Generation (Paper Explained)
2020-06-06	Synthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search (Paper Explained)
2020-06-05	CornerNet: Detecting Objects as Paired Keypoints (Paper Explained)
2020-06-04	Movement Pruning: Adaptive Sparsity by Fine-Tuning (Paper Explained)
2020-06-03	Learning To Classify Images Without Labels (Paper Explained)
2020-06-02	On the Measure of Intelligence by François Chollet - Part 1: Foundations (Paper Explained)
2020-06-01	Dynamics-Aware Unsupervised Discovery of Skills (Paper Explained)
2020-05-31	Synthesizer: Rethinking Self-Attention in Transformer Models (Paper Explained)
2020-05-30	[Code] How to use Facebook's DETR object detection algorithm in Python (Full Tutorial)

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

Channel	Latest
Nintendo Life	8 hours ago
lugeyps3	9 hours ago
Pixelorez	11 hours ago
Chroma	12 hours ago
Unnie Cj	12 hours ago
Brecy	12 hours ago
Renzuwu	12 hours ago
Fal Oval	12 hours ago
fadd game	12 hours ago
Aezwozere	12 hours ago
눈사람	12 hours ago
Fragilistic	12 hours ago
akitokid 青色夜想曲	13 hours ago
soydianagames	13 hours ago
상상상상	13 hours ago
Lucivius	13 hours ago
Ruckquez Nd Stuff	13 hours ago
野武士ノディー	13 hours ago
fan komar	13 hours ago
Tiago Vanz	13 hours ago
Reap	13 hours ago
ありなみパイセン	13 hours ago
69SportTV	13 hours ago
CHINGLAI HUNTER	13 hours ago
잡기사	13 hours ago