Text Classification with Active Learning

  • Blaž Novak
  • Dunja Mladenič
  • Marko Grobelnik
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


In many real world machine learning tasks, labeled training examples are expensive to obtain, while at the same time there is a lot of unlabeled examples available. One such class of learning problems is text classification. Active learning strives to reduce the required labeling effort while retaining the accuracy by intelligently selecting the examples to be labeled. However, very little comparison exists between different active learning methods. The effects of the ratio of positive to negative examples on the accuracy of such algorithms also received very little attention. This paper presents a comparison of two most promising methods and their performance on a range of categories from the Reuters Corpus Vol. 1 news article dataset.


Active Learning Version Space News Article Language Resource Active Learning Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. ANGLUIN, D. (1988): Queries and concept learning. Machine Learning, 2(3):319–342, 1988Google Scholar
  2. BARAM, Y. and EL-YANIV, R. and LUZ, K. (2004): Online Choice of Active Learning Algorithms. The Journal of Machine Learning Research, 2004, 255–291Google Scholar
  3. FREUND, Y. and SEUNG, H. S. and SHAMIR, E. and TISHBY, N. (1993): Information, prediction, and query by committee. Advances in Neural Information Processing Systems 5, pages 483–490, 1993Google Scholar
  4. LEWIS, D. D. and GALE, W. A. (1994): A sequential algorithm for training text classifiers. In: Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval.Google Scholar
  5. MUSLEA, I. and MINTON, S. and KNOBLOCK, C. (2002): Active + Semi-supervised Learning = Robust Multi-View learning. In: Proc. of the 19th International Conference on Machine Learning, pp. 435–442.Google Scholar
  6. PLATT, J. C. (2002): Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. Advances in Large Margin Classifiers, MIT PressGoogle Scholar
  7. ROSE, T. G. and STEVENSON, M. and WHITEHEAD, M. (2002): The Reuters Corpus Volume 1 — from Yesterday’s News to Tomorrow’s Language Resources. In: 3rd International Conference on Language Resources and Evaluation, May, p. 7Google Scholar
  8. ROY, N. and MCCALLUM, A. (2001): Toward Optimal Active Learning through Sampling Estimation of Error Reduction. In: Proc. of the 18th International Conference on Machine Learning, pp 441–448.Google Scholar
  9. SALTON, G. (1991): Developments in Automatic Text Retrieval. Science, Vol 253, pp 974–979, 1991MathSciNetGoogle Scholar
  10. SEUNG H. S. and OPPER, M. and SOMPOLINSKY, H. (1992): Query by Committee. Computational Learning Theory pp. 287–294, 1992Google Scholar
  11. TONG, S. and KOLLER, D. (2000): Support Vector Machine Active Learning with Applications to Text Classification. In: Proc. of the 17th International Conference on Machine Learning, pp. 999–1006.Google Scholar

Copyright information

© Springer Berlin · Heidelberg 2006

Authors and Affiliations

  • Blaž Novak
    • 1
  • Dunja Mladenič
    • 1
  • Marko Grobelnik
    • 1
  1. 1.Jožef Stefan InstituteLjubljanaSlovenia

Personalised recommendations