Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling

Dmitriy Dligach and Martha Palmer
University of Colorado at Boulder


Abstract

Active Learning (AL) is typically initialized with a small seed of examples selected randomly. However, when the distribution of classes in the data is skewed, some classes may be missed, resulting in a slow learning progress. Our contribution is twofold: (1) we show that an unsupervised language modeling based technique is effective in selecting rare class examples, and (2) we use this technique for seeding AL and demonstrate that it leads to a higher learning rate. The evaluation is conducted in the context of word sense disambiguation.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-2002.pdf