Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules

Qin Gao and Stephan Vogel
Language Technologies Institute, Carnegie Mellon University


Abstract

We present an approach of expanding parallel corpora for machine translation. By utilizing Semantic role labeling (SRL) on one side of the language pair, we extract SRL substitution rules from existing parallel corpus. The rules are then used for generating new sentence pairs. An SVM classifier is built to filter the generated sentence pairs. The filtered corpus is used for training phrase-based translation models, which can be used directly in translation tasks or combined with baseline models. Experimental results on Chinese-English machine translation tasks show an average improvement of 0.45 BLEU and 1.22 TER points across 5 different NIST test sets.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-2051.pdf