Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models

Elias Ponvert,  Jason Baldridge,  Katrin Erk
The University of Texas at Austin


Abstract

We consider a new subproblem of unsupervised parsing from raw text, unsupervised partial parsing---the unsupervised version of text chunking. We show that addressing this task directly, using probabilistic finite-state methods, produces better results than relying on the local predictions of a current best unsupervised parser, Seginer's (2007) CCL. These finite-state models are combined in a cascade to produce more general (full-sentence) constituent structures; doing so outperforms CCL by a wide margin in unlabeled PARSEVAL scores for English, German and Chinese. Finally, we address the use of phrasal punctuation as a heuristic indicator of phrasal boundaries, both in our system and in CCL.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-1108.pdf