Learning Ontologies to Improve Text Clustering and Classification

  • Stephan Bloehdorn
  • Philipp Cimiano
  • Andreas Hotho
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


Recent work has shown improvements in text clustering and classification tasks by integrating conceptual features extracted from ontologies. In this paper we present text mining experiments in the medical domain in which the ontological structures used are acquired automatically in an unsupervised learning process from the text corpus in question. We compare results obtained using the automatically learned ontologies with those obtained using manually engineered ones. Our results show that both types of ontologies improve results on text clustering and classification tasks, whereby the automatically acquired ontologies yield a improvement competitive with the manually engineered ones.


Noun Phrase Latent Semantic Analysis Formal Concept Analysis Text Corpus Concept Hierarchy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. BLOEHDORN, S. and HOTHO, A. (2004): Text Classification by Boosting Weak Learners based on Terms and Concepts. In: Proceedings of ICDM, 2004. IEEE Computer Society.Google Scholar
  2. CAI, L. and HOFMANN, T. (2003): Text Categorization by Boosting Automatically Extracted Concepts. In: Proceedings of ACM SIGIR, 2003. ACM Press.Google Scholar
  3. CIMIANO, P.; HOTHO, A. and STAAB, S. (2004): Comparing Conceptual, Partitional and Agglomerative Clustering for Learning Taxonomies from Text. In: Proceedings of ECAI’04. IOS Press.Google Scholar
  4. CIMIANO, P. and HOTHO, A. and STAAB, S. (2005): Learning Concept Hieararchies from Text Corpora using Formal Concept Analysis. Journal of Artificial Intelligence Research. To appear.Google Scholar
  5. DEERWESTER, S.; DUMAIS, S.T.; LANDAUER, T.K.; FURNAS, G. W. and HARSHMAN, R.A. (1990): Indexing by Latent Semantic Analysis. Journal of the Society for Information Science, 41, 391–407.Google Scholar
  6. FREUND, Y. and SCHAPIRE, R.E. (1995): A Decision Theoretic Generalization of On-Line Learning and an Application to Boosting. In: Second European Conference on Computational Learning Theory (EuroCOLT-95).Google Scholar
  7. GREEN, S.J. (1999): Building Hypertext Links By Computing Semantic Similarity. IEEE Transactions on Knowledge and Data Engineering, 11, 713–730.CrossRefGoogle Scholar
  8. HARRIS, Z. (1968): Mathematical Structures of Language. Wiley, New York, US.Google Scholar
  9. HEARST, M.A. (1992): Automatic Acquisition of Hyponyms from Large Text Corpora. In: Proceedings of the 14th International Conference on Computational Linguistics (COLING).Google Scholar
  10. HERSH, W. R.; BUCKLEY, C.; LEONE, T.J. and HICKAM, D.H. (1994): OHSUMED: An Interactive Retrieval Evaluation and new large Test Collection for Research. In: Proceedings of ACM SIGIR, 1994. ACM Press.Google Scholar
  11. HINDLE, D. (1990): Noun Classification from Predicate-Argument Structures. In: Proceedings of the Annual Meeting of the ACL.Google Scholar
  12. HOTHO, A.; STAAB, S. and STUMME, G. (2003): Ontologies Improve Text Document Clustering. In: Proceedings of ICDM, 2003. IEEE Computer Society.Google Scholar
  13. JAIN, A. K., MURTY, M. N., and FLYNN, P. J. (1999): Data Clustering: A review. ACM Computing Surveys, 31, 264–323.CrossRefGoogle Scholar
  14. MAEDCHE, A. and STAAB, S. (2001): Ontology Learning for the Semantic Web. IEEE Intelligent Systems, 16, 72–79.CrossRefGoogle Scholar
  15. REINBERGER, M.-L. and SPYNS, P. (2005): Unsupervised Text Mining for the Learning of DOGMA-inspired Ontologies. In: Ontology Learning from Text: Methods, Evaluation and Applications. IOS Press. To appear.Google Scholar
  16. SALTON, G. and MCGILL, M.J. (1983): Introduction to Modern Information Retrieval. McGraw-Hill, New York, NY, US.Google Scholar
  17. SCOTT, S. and MATWIN, S. (1999): Feature Engineering for Text Classification. In: Proceedings of ICML, 1999. Morgan Kaufmann. 379–388.Google Scholar
  18. SEBASTIANI, F. (2002): Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34, 1–47CrossRefMathSciNetGoogle Scholar
  19. STEINBACH, M., KARYPIS, G., and KUMAR, V. (2000): A Comparison of Document Clustering Techniques. In: KDD Workshop on Text Mining 2000.Google Scholar
  20. WANG, B.; MCKAY, R.I.; ABBASS, H.A. and BARLOW, M. (2003): A Comparative Study for Domain Ontology Guided Feature Extraction. In: Proceedings of ACSC-2003. Australian Computer Society.Google Scholar

Copyright information

© Springer Berlin · Heidelberg 2006

Authors and Affiliations

  • Stephan Bloehdorn
    • 1
  • Philipp Cimiano
    • 1
  • Andreas Hotho
    • 2
  1. 1.Institute AIFBUniversity of KarlsruheKarlsruheGermany
  2. 2.KDE GroupUniversity of KasselKasselGermany

Personalised recommendations