Text Classification with Active Learning
- 1.7k Downloads
In many real world machine learning tasks, labeled training examples are expensive to obtain, while at the same time there is a lot of unlabeled examples available. One such class of learning problems is text classification. Active learning strives to reduce the required labeling effort while retaining the accuracy by intelligently selecting the examples to be labeled. However, very little comparison exists between different active learning methods. The effects of the ratio of positive to negative examples on the accuracy of such algorithms also received very little attention. This paper presents a comparison of two most promising methods and their performance on a range of categories from the Reuters Corpus Vol. 1 news article dataset.
KeywordsActive Learning Version Space News Article Language Resource Active Learning Method
Unable to display preview. Download preview PDF.
- ANGLUIN, D. (1988): Queries and concept learning. Machine Learning, 2(3):319–342, 1988Google Scholar
- BARAM, Y. and EL-YANIV, R. and LUZ, K. (2004): Online Choice of Active Learning Algorithms. The Journal of Machine Learning Research, 2004, 255–291Google Scholar
- FREUND, Y. and SEUNG, H. S. and SHAMIR, E. and TISHBY, N. (1993): Information, prediction, and query by committee. Advances in Neural Information Processing Systems 5, pages 483–490, 1993Google Scholar
- LEWIS, D. D. and GALE, W. A. (1994): A sequential algorithm for training text classifiers. In: Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval.Google Scholar
- MUSLEA, I. and MINTON, S. and KNOBLOCK, C. (2002): Active + Semi-supervised Learning = Robust Multi-View learning. In: Proc. of the 19th International Conference on Machine Learning, pp. 435–442.Google Scholar
- PLATT, J. C. (2002): Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. Advances in Large Margin Classifiers, MIT PressGoogle Scholar
- ROSE, T. G. and STEVENSON, M. and WHITEHEAD, M. (2002): The Reuters Corpus Volume 1 — from Yesterday’s News to Tomorrow’s Language Resources. In: 3rd International Conference on Language Resources and Evaluation, May, p. 7Google Scholar
- ROY, N. and MCCALLUM, A. (2001): Toward Optimal Active Learning through Sampling Estimation of Error Reduction. In: Proc. of the 18th International Conference on Machine Learning, pp 441–448.Google Scholar
- SEUNG H. S. and OPPER, M. and SOMPOLINSKY, H. (1992): Query by Committee. Computational Learning Theory pp. 287–294, 1992Google Scholar
- TONG, S. and KOLLER, D. (2000): Support Vector Machine Active Learning with Applications to Text Classification. In: Proc. of the 17th International Conference on Machine Learning, pp. 999–1006.Google Scholar