Preliminary Program

Gappy Phrasal Alignment By Agreement

Mohit Bansal¹, Chris Quirk², Robert Moore³
¹UC Berkeley, ²Microsoft Research, ³Google Research

Abstract

We propose a principled and efficient phrase-to-phrase alignment model, useful in machine translation as well as other related natural language processing problems. In a hidden semi-Markov model, word-to-phrase and phrase-to-word translations are modeled directly by the system. Agreement between two directional models encourages the selection of parsimonious phrasal alignments, avoiding the overfitting commonly encountered in unsupervised training with multi-word units. Expanding the state space to include "gappy phrases" (such as French ne ... pas) makes the alignment space more symmetric; thus, it allows agreement between discontinuous alignments. The resulting system shows substantial improvements in both alignment quality and translation quality over word-based Hidden Markov Models, while maintaining asymptotically equivalent runtime.

Full paper: http://www.aclweb.org/anthology/P/P11/P11-1131.pdf