Crowdsourcing for Statistical Machine Translation

Channel:

Subscribers:

351,000

Published on August 17, 2016 3:00:42 AM ● Video Link: https://www.youtube.com/watch?v=9BsmXzrqfNo

Duration: 1:55:06

236 views

Modern approaches to machine translation are data-driven. Statistical translation models are trained using parallel text, which consist of sentences in one language paired with their translation into another language. One advantage of statistical translation models is that they are language independent, meaning that they can be applied to any language that we have training data for. Unfortunately, for most of the world's languages, do not have sufficient amounts of training data. In this talk, I will detail my experiments using Amazon's Mechanical Turk to create crowd-sourced translations for 'low resource' languages that we do not have training data for. I will discuss a variety of quality-control strategies that allow non-expert translators to produce translations approaching the level of professional translators, at a fraction of the cost. I'll analyze the impact of the quality of training data on the performance of the statistical translation model that we train from it, and ask the question: should we even bother with quality control? I'll present feasibility studies to see which low resource languages it is possible to collect data for, and volume studies to see how much data we can expect to create in a short period. Finally, I will discuss the implications of inexpensive, high quality, translations for applications including national defense, disaster response, research, and online translation systems.

Other Videos By Microsoft Research

2016-08-16	Near Optimal Online Algorithms and Fast Approximation Algorithms for Resource Allocation Problems
2016-08-16	Interpreting the Community: Information Practices and/for Deviance
2016-08-16	Pretty Good Democracy for a variety of voting schemes
2016-08-16	Learning Valuation Functions
2016-08-16	Applying Semantic Analyses to Content-based Recommendation and Document Clustering
2016-08-16	Fusing Mobile, Sensor, and Social Computing in the Cloud To Enable Context-Aware Applications
2016-08-16	The Past, Present, and Future of Video Telephony
2016-08-16	Multi-People Tracking through Global Optimization
2016-08-16	Bridging Shannon and Hamming: Codes for Computationally Simple Channels
2016-08-16	Using Program Verification Tools in Teaching
2016-08-16	Crowdsourcing for Statistical Machine Translation
2016-08-16	Computational Science Research in Latin America
2016-08-16	The Laplacian Paradigm: Emerging Algorithms for Massive Graphs
2016-08-16	YouΓÇÖre the Manager but IΓÇÖm the Mayor: Understanding Foursquare Check-ins in Claimed Venues
2016-08-16	Beyond the Gaussian Universality Class
2016-08-16	Microsoft Academic Search: Next-Generation Scholarly Discovery
2016-08-16	Semantic Knowledge for Commodity Computing: Focus on Information Mining and Intelligence
2016-08-16	Semantic Knowledge for Commodity Computing: Myth or Reality? Information and Knowledge Acquisition
2016-08-16	Listen-n-feel: An Emotion Sensor on the Phone Using Speech Processing and Cloud Computing
2016-08-16	Open Data for Open Science: The Microsoft Environmental Informatics Framework (EIF)
2016-08-16	Binary Descriptors for Efficient Matching and Retrieval in Large Image Databases

Tags:

microsoft research

Channel	Latest
César-ELQ	6 hours ago
kmlunl	6 hours ago
DaBubbly ShowTimeBRG	6 hours ago
Pokimane Too	6 hours ago
bubmanXIV	6 hours ago
GamesWorld	6 hours ago
Roknar	6 hours ago
Oyun Rotası	6 hours ago
DexonN	6 hours ago
Владимир «DES13» Иванов	6 hours ago
Tycen	7 hours ago
Soki	7 hours ago
Muxakep Михакер	7 hours ago
FluffeyPanda's World	7 hours ago
Synystar	7 hours ago
Yoananas & Cie	7 hours ago
Patsch	7 hours ago
MRFOURKAY	7 hours ago
Ayush More	7 hours ago
keebabb	7 hours ago
ingresso.com	7 hours ago
OVA Let's Play	7 hours ago
Wanoteca (Wano)	7 hours ago
Diolino Ledes	7 hours ago
ImCade	7 hours ago