Combining Morpheme-based Machine Translation with Post-processing Morpheme Prediction

Ann Clifton and Anoop Sarkar
Simon Fraser University


Abstract

This paper extends the training and tuning regime for phrase-based statistical machine translation to obtain fluent translations into morphologically complex languages (we build an English to Finnish translation system). Our methods are not language specific, and we use unsupervised morphology induction. Unlike previous work we focus on morphologically productive phrase pairs -- our decoder can combine morphemes across phrase boundaries. Morphemes in the target language may not have a corresponding morpheme or word in the source language. Therefore, we propose a novel combination of post-processing morphology prediction with morpheme-based translation. We show, using both automatic evaluation scores and linguistically motivated analyses of the output, that our methods outperform previously proposed ones and provide the best known results on the English-Finnish EuroParl translation task. Our methods are mostly language independent, so they should improve translation into other target languages with complex morphology.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-1004.pdf