Crowdsourcing for Statistical Machine Translation

Subscribers:
351,000
Published on ● Video Link: https://www.youtube.com/watch?v=9BsmXzrqfNo



Duration: 1:55:06
236 views
5


Modern approaches to machine translation are data-driven. Statistical translation models are trained using parallel text, which consist of sentences in one language paired with their translation into another language. One advantage of statistical translation models is that they are language independent, meaning that they can be applied to any language that we have training data for. Unfortunately, for most of the world's languages, do not have sufficient amounts of training data. In this talk, I will detail my experiments using Amazon's Mechanical Turk to create crowd-sourced translations for 'low resource' languages that we do not have training data for. I will discuss a variety of quality-control strategies that allow non-expert translators to produce translations approaching the level of professional translators, at a fraction of the cost. I'll analyze the impact of the quality of training data on the performance of the statistical translation model that we train from it, and ask the question: should we even bother with quality control? I'll present feasibility studies to see which low resource languages it is possible to collect data for, and volume studies to see how much data we can expect to create in a short period. Finally, I will discuss the implications of inexpensive, high quality, translations for applications including national defense, disaster response, research, and online translation systems.




Other Videos By Microsoft Research


2016-08-16Near Optimal Online Algorithms and Fast Approximation Algorithms for Resource Allocation Problems
2016-08-16Interpreting the Community: Information Practices and/for Deviance
2016-08-16Pretty Good Democracy for a variety of voting schemes
2016-08-16Learning Valuation Functions
2016-08-16Applying Semantic Analyses to Content-based Recommendation and Document Clustering
2016-08-16Fusing Mobile, Sensor, and Social Computing in the Cloud To Enable Context-Aware Applications
2016-08-16The Past, Present, and Future of Video Telephony
2016-08-16Multi-People Tracking through Global Optimization
2016-08-16Bridging Shannon and Hamming: Codes for Computationally Simple Channels
2016-08-16Using Program Verification Tools in Teaching
2016-08-16Crowdsourcing for Statistical Machine Translation
2016-08-16Computational Science Research in Latin America
2016-08-16The Laplacian Paradigm: Emerging Algorithms for Massive Graphs
2016-08-16YouΓÇÖre the Manager but IΓÇÖm the Mayor: Understanding Foursquare Check-ins in Claimed Venues
2016-08-16Beyond the Gaussian Universality Class
2016-08-16Microsoft Academic Search: Next-Generation Scholarly Discovery
2016-08-16Semantic Knowledge for Commodity Computing: Focus on Information Mining and Intelligence
2016-08-16Semantic Knowledge for Commodity Computing: Myth or Reality? Information and Knowledge Acquisition
2016-08-16Listen-n-feel: An Emotion Sensor on the Phone Using Speech Processing and Cloud Computing
2016-08-16Open Data for Open Science: The Microsoft Environmental Informatics Framework (EIF)
2016-08-16Binary Descriptors for Efficient Matching and Retrieval in Large Image Databases



Tags:
microsoft research