The Web as an Implicit Training Set: Application to Noun Compounds Syntax and...

Channel:

Google TechTalks

Subscribers:

349,000

Published on November 7, 2007 10:54:27 AM ● Video Link: https://www.youtube.com/watch?v=GxExWZFzrZk

Duration: 55:30

2,651 views

Google Tech Talks
November, 5 2007

ABSTRACT

Speaker: Preslav Nakov

I will present Web-based approaches to the syntax and semantics of noun compounds (NCs), which can be used in query parsing, technical term understanding, etc. I will also describe an application to machine translation. First, I will present a highly accurate lightly supervised method based on surface features and paraphrases for making bracketing decisions for three-word noun compounds, e.g. "[[liver cell] antibody]" is left-bracketed, while "[liver [cell line]]" is right-bracketed. The enormous size of the Web makes such features frequent enough to be useful. Second, I will introduce an unsupervised method for discovering the implicit predicates characterizing the semantic relations that hold in noun-noun compounds. For example, "malaria mosquito" is a "mosquito that carries/spreads/causes/transmits/brings/infects with/... malaria". Finally, I will present a method for improving Machine Translation (SMT). Most modern SMT systems rely on aligned sentences of bilingual corpora for training. I will describe a method for expanding the training set with conceptually similar but syntactically differing paraphrases at the NP-level which involve NCs. The English to Spanish evaluation on the Europarl corpus shows an improvement equivalent to 33%-50% of that of doubling the amount of training data.

Other Videos By Google TechTalks

2007-11-22	PhotoTechEDU Day 30: Imaging optics for the next decade
2007-11-22	Symbolic Execution and Model Checking for Testing
2007-11-22	Google Website Optimizer: Content Testing for Everyone
2007-11-22	PyPy - Automatic Generation of VMs for Dynamic Languages
2007-11-22	Recommenders Everywhere: The WikiLens Community-Maintained Recommender System
2007-11-21	Developing Rich Internet Apps with Adobe AIR
2007-11-20	Tangible Functional Programming
2007-11-16	High Performance Web Sites and YSlow
2007-11-15	Prince XML: Generating High Quality PDFs from HTML + CSS
2007-11-13	Polyworld: Using Evolution to Design Artificial Intelligence
2007-11-07	The Web as an Implicit Training Set: Application to Noun Compounds Syntax and...
2007-11-07	Show Me What's Wrong Inside: Making 3D Medical Data Accessible to Everyone
2007-11-06	The Evolution of Expertise (or, "The reports of authority's death have been g...
2007-11-06	"Drill down into your Code - Software Quality via Code Queries in SemmleCode"
2007-11-06	An Update on the Open Source InChI Project
2007-11-02	Building Industrial Strength Performance Tools
2007-11-02	Dryad: A general-purpose distributed execution platform
2007-11-02	Wuala - a distributed file system
2007-10-27	Unleashing Video Search
2007-10-26	What have We Learned from Market Design?
2007-10-26	The Web That Wasn't

Tags:

google

techtalks

techtalk

engedu

talk

talks

googletechtalks

education

Channel	Latest
ありなみパイセン	6 hours ago
Evelone Rofls	6 hours ago
Tekken 8 Re Plays	6 hours ago
ransmo5	6 hours ago
상상상상	6 hours ago
Nando-Friki	6 hours ago
Enoch Hui 2 (鐵路丶巴士丶Switch & 迷你公仔迷)	6 hours ago
Drunken Disciple	6 hours ago
Flame-Of-Justice	6 hours ago
Al Pachino vs 5	7 hours ago
Dividen 365	7 hours ago
ASURA_REMIL	7 hours ago
Gamers Pettai	7 hours ago
철권엠아재(MBCtekken)	7 hours ago
Jam jest Jakub	7 hours ago
Wicked LC	7 hours ago
RUNEMASTER-	7 hours ago
XINNN	7 hours ago
Heroth	7 hours ago
Kerr9000	8 hours ago
LelandAndroid	8 hours ago
Alex Richardson (Sprunklez)	8 hours ago
SACZI	8 hours ago
Ding Gamer	8 hours ago
Hosenka92	8 hours ago