A Comprehensive Dictionary of Multiword Expressions

Kosho Shudo1,  Akira Kurahone2,  Toshifumi Tanabe1
1Fukuoka University, 2TechTran Ltd.


Abstract

It has been widely recognized that one of the most difficult and intriguing problems in natural language processing (NLP) is how to cope with idiosyncratic multiword expressions. This paper presents an overview of the comprehensive dictionary (JDMWE) of Japanese multiword expressions. The JDMWE is characterized by a large notational, syntactic, and semantic diversity of contained expressions as well as a detailed description of their syntactic functions, structures, and flexibilities. The dictionary contains about 104,000 expressions, potentially 750,000 expressions. This paper shows that the JDMWE’s validity can be supported by comparing the dictionary with a large-scale Japanese N-gram frequency dataset, namely the LDC2009T08, generated by Google Inc. (Kudo et al. 2009).




Full paper: http://www.aclweb.org/anthology/P/P11/P11-1017.pdf