Judging Grammaticality with Tree Substitution Grammar Derivations

Matt Post
Johns Hopkins University


Abstract

In this paper, we show that local features computed from the derivations of tree substitution grammars --- such as the identify of particular fragments, and a count of large and small fragments

--- are useful in binary grammatical classification tasks. Such features outperform n-gram features and various model scores by a wide margin. Although they fall short of the performance of the hand-crafted feature set of \namecite{charniak2005coarse} developed for parse tree reranking, they do so with an order of magnitude fewer features. Furthermore, since the TSGs employed are learned in a Bayesian setting, the use of their derivations can be viewed as the automatic discovery of tree patterns useful for classification. On the BLLIP dataset, we achieve an accuracy of 89.9\% in discriminating between grammatical text and samples from an n-gram language model.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-2038.pdf