Fine-Grained Class Label Markup of Search Queries

Joseph Reisinger1 and Marius Pasca2
1The University of Texas at Austin, 2Google


Abstract

We develop a novel approach to the semantic analysis of short text segments and demonstrate its utility on a large corpus of Web search queries. Extracting meaning from short text segments is difficult as there is little semantic redundency between terms; hence methodsbased on shallow semantic analysis may fail to accurately estimate meaning. Furthermore searchqueries lack explicit syntax often used to determine intent in question answering. In this paper we propose a hybrid model of semantic analysis combining explicit class-label extraction with a latent class PCFG. This class-label correlation (CLC) model admits a robust parallel approximation, allowing it to scale to terabytes of query data. We demonstrate its performance in terms of (1) its predicted label accuracy on polysemous queries and (2) its ability to accurately chunk queries into base constituents.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-1120.pdf