Disambiguating temporal-contrastive connectives for machine translation

Thomas Meyer
Idiap Research Institute, Martigny, Switzerland


Abstract

The paper describes disambiguation experiments for a specific subset of explicit discourse connectives. Based on examinations in parallel corpora we identified the connectives "although", "but", "however", "meanwhile", "since", "though", "when" and "while" as being particularly problematic for translation by current Statistical Machine Translation (SMT) systems. These temporal-contrastive connectives commonly signal the senses "temporal", "contrast", "concession", "expansion", "cause" and "condition", which are, as we also show, hard to annotate even by humans. French and German translation examples are given where there is no direct lexical correspondence for the source language connective in the target language and where missing the senses signaled by the connective is another source of translation errors. Disambiguating the senses mentioned and their tagging in large corpora could help to train SMT systems to avoid these errors. Our disambiguation experiments reach accuracies above 70% for such fine-grained distinctions as the one between "contrast" and "concession". In addition, first experiments for SMT show a slight increase of BLEU scores caused by added information on the senses of discourse connectives.




Full paper: http://www.aclweb.org/anthology/P/P11/.pdf