Insights from Network Structure for Text Mining

Zornitsa Kozareva and Eduard Hovy
ISI/USC


Abstract

Text mining and data harvesting algorithms have become popular in the computational linguistics community. They employ patterns that specify the kind of information to be harvested, and usually bootstrap either the pattern learning or the term harvesting process (or both) in a recursive cycle, using data learned in one step to generate more seeds for the next. They therefore treat the source text corpus as a network, in which words are the nodes and relations linking them are the edges. The results of computational network analysis, especially from the world wide web, are thus applicable. Surprisingly, these results have not yet been broadly introduced into the computational linguistics community. In this paper we show how various results apply to text mining, how they explain some previously observed phenomena, and how they can be helpful for computational linguistics applications.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-1162.pdf