Syntax Augmented Machine Translation: Challenges in Search and Learning Effec...

Subscribers:
348,000
Published on ● Video Link: https://www.youtube.com/watch?v=2GhzD2wE55M



Duration: 39:16
6,874 views
9


Google Tech Talks
December, 17 2007

ABSTRACT

Ashish Venugopal - RESEARCH SCIENTIST

Probabilistic Synchronous Context Free Grammars hold significant promise for machine translation, modeling context sensitive translation and re-odering effects with simple hierarchical operations learned directly from parallel data. Source language sentences are transformed into target language sentences via intermediate nonterminal symbols, typically via bottom up chart parsing with these grammars.

Introducing an N-Gram language model into this search space introduces dependencies between consecutive chart items, making exact search computationally difficult. We present a two pass approach that is motivated by grammars which include a large number of nonterminal symbols. We evaluate this method against a state of the art single pass approach.

The motivation for this two pass approach comes from a desire to include a large number of nonterminal labels in the translation grammar. Initial results using labels from associated phrase structure parse trees are promising, but this data is often noisy and requires human data generation. We propose a novel method to discriminatively learn nonterminal labels towards directly improving translation quality.

Speaker: Ashish Venugopal







Tags:
google
techtalks
techtalk
engedu
talk
talks
googletechtalks
education