Extracting Paraphrases from Definition Sentences on the Web

Chikara Hashimoto1,  Kentaro Torisawa1,  Stijn De Saeger1,  Jun'ichi Kazama1,  Sadao Kurohashi2
1National Institute of Information and Communications Technology, 2Kyoto University


Abstract

We propose an automatic method of extracting paraphrases from definition sentences, which are also automatically acquired from the Web. We observe that a huge number of concepts are defined in Web documents, and the sentences that define the same concept tend to convey mostly the same information using different expressions and paraphrases abound in them. We show that a large number of paraphrases can be automatically extracted with high precision by regarding the sentences that define the same concept as parallel corpora. Experimental results indicated that with our method it was possible to extract about 300,000 paraphrases from 6 x 10^8 Web documents with a precision rate of more than 0.94.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-1109.pdf