Entity Set Expansion using Topic information

Kugatsu Sadamitsu,  Kuniko Saito,  Kenji Imamura,  Genichiro Kikui
NTT Cyber Space Laboratories, NTT Corporation


Abstract

This paper proposes three modules for alleviating “semantic drift” in bootstrapping entity set expansion. These new modules are added to three steps in a discriminative bootstrapping algorithm and control features, negative examples and entity candidates by referring to latent topics of documents. In this study, we model latent topics with LDA (Latent Dirichlet Allocation) in unsupervised way. Experimental results show that the accuracy of the extracted entities is improved by 6.7 to 28.2% depending on the domain.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-2128.pdf