Gappy Phrasal Alignment By Agreement

Mohit Bansal1,  Chris Quirk2,  Robert Moore3
1UC Berkeley, 2Microsoft Research, 3Google Research


Abstract

We propose a principled and efficient phrase-to-phrase alignment model, useful in machine translation as well as other related natural language processing problems. In a hidden semi-Markov model, word-to-phrase and phrase-to-word translations are modeled directly by the system. Agreement between two directional models encourages the selection of parsimonious phrasal alignments, avoiding the overfitting commonly encountered in unsupervised training with multi-word units. Expanding the state space to include "gappy phrases" (such as French ne ... pas) makes the alignment space more symmetric; thus, it allows agreement between discontinuous alignments. The resulting system shows substantial improvements in both alignment quality and translation quality over word-based Hidden Markov Models, while maintaining asymptotically equivalent runtime.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-1131.pdf