Learning Ontology-Aware Classifiers

  • Jun Zhang
  • Doina Caragea
  • Vasant Honavar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3735)


Many practical applications of machine learning in data-driven scientific discovery commonly call for the exploration of data from multiple points of view that correspond to explicitly specified ontologies. This paper formalizes a class of problems of learning from ontology and data, and explores the design space of learning classifiers from attribute value taxonomies (AVTs) and data. We introduce the notion of AVT-extended data sources and partially specified data. We propose a general framework for learning classifiers from such data sources. Two instantiations of this framework, AVT-based Decision Tree classifier and AVT-based Naïve Bayes classifier are presented. Experimental results show that the resulting algorithms are able to learn robust high accuracy classifiers with substantially more compact representations than those obtained by standard learners.


Intrusion Detection Hypothesis Space Hypothesis Class Estimate Error Rate Instance Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Almuallim H., Akiba, Y., Kaneda, S.: On Handling Tree-Structured Attributes. In: Proceedings of the Twelfth International Conference on Machine Learning (1995)Google Scholar
  2. 2.
    Akaike, H.: A New Look at Statistical Model Identification. IEEE Trans. on Automatic Control AU-19, 716–722 (1974)Google Scholar
  3. 3.
    Ashburner, M., et al.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 25(1) (2000)Google Scholar
  4. 4.
    Bergadano, F., Giordana, A.: Guiding Induction with Domain Theories. In: Machine Learning - An Artificial Intelligence Approach, vol. 3, pp. 474–492. Morgan Kaufmann, San Francisco (1990)Google Scholar
  5. 5.
    Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Scientific American (May 2001)Google Scholar
  6. 6.
    Blum, A., Mitchell, T.: Combining Labeled and Unlabeled Data with Co-Training. In: Annual Conference on Computational Learning Theory (COLT 1998) (1998)Google Scholar
  7. 7.
    Caragea, D., Silvescu, A., Honavar, V.: A Framework for Learning from Distributed Data Using Sufficient Statistics and its Application to Learning Decision Trees. International Journal of Hybrid Intelligent Systems 1 (2004)Google Scholar
  8. 8.
    Caragea, D., Pathak, J., Honavar, V.: Learning Classifiers from Semantically Heterogeneous Data. In: 3rd International Conference on Ontologies, Databases, and Applications of Semantics for Large Scale Information Systems (2004)Google Scholar
  9. 9.
    Clare, A., King, R.: Knowledge discovery in multi-label phenotype data. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, p. 42. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  10. 10.
    Cohen, W.: Learning Trees and Rules with Set-valued Features. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence. AAAI Press, Menlo Park (1996)Google Scholar
  11. 11.
    Dempster, A., Laird, N., Rubin, D.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B 39(1), 1–38 (1977)zbMATHMathSciNetGoogle Scholar
  12. 12.
    Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian Network Classifiers. Machine Learning 29 (1997)Google Scholar
  13. 13.
    Han, J., Fu, Y.: Exploration of the Power of Attribute-Oriented Induction in Data Mining. In: Fayyad, U.M., et al. (eds.) Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press (1996)Google Scholar
  14. 14.
    Haussler, D.: Quantifying Inductive Bias: AI Learning Algorithms and Valiant’s Learning Framework. Artificial Intelligence 36 (1988)Google Scholar
  15. 15.
    Kang, D., Silvescu, A., Zhang, J., Honavar, V.: Generation of Attribute Value Taxonomies from Data for Data-Driven Construction of Accurate and Compact Classifiers. To appear: Proceedings of The Fourth IEEE International Conference on Data Mining (2004)Google Scholar
  16. 16.
    Kohavi, R., Provost, P.: Applications of Data Mining to Electronic Commerce. Data Mining and Knowledge Discovery 5 (2001)Google Scholar
  17. 17.
    Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: Proceedings of the 14th Int’l Conference on Machine Learning (1997)Google Scholar
  18. 18.
    McClean, S., Scotney, B., Shapcott, M.: Aggregation of Imprecise and Uncertain Information in Databases. IEEE Trans. on Knowledge and Data Engineering 13(6), 902–912 (2001)CrossRefGoogle Scholar
  19. 19.
    Pazzani, M., Kibler, D.: The role of prior knowledge in inductive learning. Machine Learning 9, 54–97 (1992)Google Scholar
  20. 20.
    Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo (1992)Google Scholar
  21. 21.
    Rissanen, J.: Modeling by shortest data description. Automatica 14 (1978)Google Scholar
  22. 22.
    Sowa, J.: Knowledge Representation: Logical, Philosophical, and Computational Foundations. PWS Publishing, New York (1999)Google Scholar
  23. 23.
    Taylor, M., Stoffel, K., Hendler, J.: Ontology-based Induction of High Level Classification Rules. In: SIGMOD Data Mining and Knowledge Discovery workshop proceedings. Tuscon, Arizona (1997)Google Scholar
  24. 24.
    Towell, G., Shavlik, J.: Knowledge-based Artificial Neural Networks. Artificial Intelligence 70 (1994)Google Scholar
  25. 25.
    Undercoffer, J., et al.: A Target Centric Ontology for Intrusion Detection: Using DAML+OIL to Classify Intrusive Behaviors. To appear, Knowledge Engineering Review - Special Issue on Ontologies for Distributed Systems, Cambridge University Press (2004)Google Scholar
  26. 26.
    Zhang, J., Silvescu, A., Honavar, V.G.: Ontology-driven induction of decision trees at multiple levels of abstraction. In: Koenig, S., Holte, R.C. (eds.) SARA 2002. LNCS (LNAI), vol. 2371, p. 316. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  27. 27.
    Zhang, J., Honavar, V.: Learning Decision Tree Classifiers from Attribute Value Taxonomies and Partially Specified Instances. In: Proceedings of the 20th Int’l Conference on Machine Learning (2003)Google Scholar
  28. 28.
    Zhang, J., Honavar, V.: AVT-NBL: An Algorithm for Learning Compact and Accurate Naïve Bayes Classifiers from Attribute Value Taxonomies and Data. In: Proceedings of the Fourth IEEE International Conference on Data Mining (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Jun Zhang
    • 1
  • Doina Caragea
    • 1
  • Vasant Honavar
    • 1
  1. 1.Artificial Intelligence Research Laboratory, Department of Computer ScienceIowa State UniversityAmesUSA

Personalised recommendations