Advertisement

KDD Support Services Based on Data Semantics

  • Claudia Diamantini
  • Domenico Potena
  • Maurizio Panti
Conference paper
  • 701 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3730)

Abstract

The identification of valid, novel and interesting models from large volumes of data is the primary goal of Knowledge Discovery in Databases (KDD). In order to successfully achieve such a complex goal, many kinds of semantic information about the KDD and business domains is necessary. In this paper, we present an approach to the characterization of semantic domain information for a particular kind of KDD process: classification. In particular we show how, by estimating the properties of the true but unknown classification model, one can derive domain information on the classification problem at hand. We discuss how, by saving these properties with the data, users profit from this information and save time for experimenting with a lot of classifiers and parameters by accessing this knowledge.

Keywords

Data Mining Data Semantics Classification Decision Border User Support 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998)Google Scholar
  2. 2.
    Brazdil, P., Soares, C., Costa, J.: Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results. Machine Learning 50(3), 251–277 (2003)zbMATHCrossRefGoogle Scholar
  3. 3.
    Cannataro, M., Comito, C.: A Data Mining Ontology for Grid Programming. In: Proc. 1st Work. on Semantics in Peer-to-Peer and Grid Computing, pp. 119–130 (2003)Google Scholar
  4. 4.
    Cannataro, M., Talia, D.: The Knowledge Grid. Comm. of the ACM 46(1), 89–93 (2003)CrossRefGoogle Scholar
  5. 5.
    Cespivova, H., Rauch, J., Svatek, V., Kejkula, M., Tomeckova, M.: Roles of Medical Ontologies in Association Mining CRISP-DM Cycle. In: ECML/PKDD Workshop on Knowledge Discovery and Ontologies, Pisa, Italy, pp. 1–12 (2004)Google Scholar
  6. 6.
    Chervenak, A., Foster, I., Kesselman, C., Tuecke, S.: Protocols and Services for Distributed Data-Intensive Science. In: Proc. Advanced Computing and Analysis Techniques in Physics (ACAT 2000), pp. 161–163 (2000)Google Scholar
  7. 7.
    Clarkson, K.: A program for convex hulls, http://cm.bell-labs.com/netlib/voronoi/hull.html
  8. 8.
    Diamantini, C., Spalvieri, A.: Quantizing for Minimum Average Misclassification Risk. IEEE Trans. on Neural Networks 9(1), 174–182 (1998)CrossRefGoogle Scholar
  9. 9.
    Diamantini, C., Potena, D., Panti, M.: Developing an Open Knowledge Discovery Support System for a Network Environment. In: Proc. of the 2005 International Symposium on Collaborative Technologies and Systems, Saint Louis, Missouri, USA, May 15-19 (2005) (to appear)Google Scholar
  10. 10.
    Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press (1996)Google Scholar
  11. 11.
    Fermandez, C., Martinez, J.F., Wasilewska, A., Hadjimichael, M., Menasalvas, E.: Data Mining - a Semantic Model. In: IEEE International Conference on Fuzzy Systems, vol. 2, pp. 938–943 (May 2002)Google Scholar
  12. 12.
    Grossman, R. (ed.): Proc. of the Second Annual ACM KDD Workshop on Data Mining Standards, Services and Platforms, Seattle, WA (August 2004)Google Scholar
  13. 13.
    Grossman, R., Mazzucco, M.: DataSpace: a Data Web for the Exploratory Analysis and Mining of Data. IEEE Computing in Science and Engineering 4(4), 44–51 (2002)Google Scholar
  14. 14.
    Grossman, R., Hornik, M., Meyer, G.: Emerging Standards and Interfaces in Data Mining. In: Ye, N. (ed.) Handbook of Data Mining, Kluwer Ac. Pub., Dordrecht (April 2003)Google Scholar
  15. 15.
    Hart, P.E.: The Condensed Nearest Neighbor Rule. IEEE Trans. on Information Theory 14, 515–516 (1968)CrossRefGoogle Scholar
  16. 16.
    Hotho, A., Staab, S., Stumme, G.: Ontologies Improve Text Document Clustering. In: IEEE International Conference on Data Mining, pp. 541–544 (November 2003)Google Scholar
  17. 17.
    Kalousis, A., Hilario, M.: Model Selection via Meta-Learning. Int. Journal on Artificial Intelligence Tools 10(4) (2001)Google Scholar
  18. 18.
    Kargupta, H., Joshi, A., Sivakumar, K., Yesha, Y.: Data Mining, Next Generation Challenges and Future Directions. AAAI/MIT Press (2004)Google Scholar
  19. 19.
    Kohonen, T., Barna, G., Chrisley, R.: Statistical Pattern Recognition With Neural Networks: Benchmarking Studies. In: IEEE International Conference on Neural Networks, San Diego CA, 24-27 July 1998, pp. 61–68 (1998)Google Scholar
  20. 20.
    Kotasek, P., Zendulka, J.: An XML Framework Proposal for Knowledge Discovery in Databases. In: European Conference on Principles and Practice of Knowledge Discovery in Databases, Workshop on Knowledge Management: Theory and Applications, Lyon, France, pp. 143–156 (2000)Google Scholar
  21. 21.
    Krishnaswamy, S., Zaslasvky, A., Loke, S. W.: Internet Delivery of Distributed Data Mining Services: Architectures, Issues and Prospects. In: Murthy, V.K., Shi, N. (eds.) Architectural Issues of Web-enabled Electronic Business, ch. 7, pp. 113–127. Idea Group Publishing, USA (2003)Google Scholar
  22. 22.
    Kumar, A., Kantardzic, M., Ramaswamy, P., Sadeghian, P.: An Extensible Service Oriented Distributed Data Mining Framework. In: Proc. IEEE/ACM Intl. Conf. on Machine Learning and Applications, Louisville, KY, USA, December 16-18 (2004)Google Scholar
  23. 23.
    Lee, C., Landgrebe, D.A.: Feature Extraction Based on Decision Boundaries. IEEE Trans. on Pattern Analysis and Machine Intelligence 15(4), 288–400 (1993)CrossRefGoogle Scholar
  24. 24.
    Morgera, S.D., Datta, L.: Towards a Fundamental Theory of Optimal Feature Selection: Part I. IEEE Trans. on Pattern Analysis and Machine Intelligence 6(5), 601–616 (1984)zbMATHCrossRefGoogle Scholar
  25. 25.
    Phillips, J., Buchanan, B.G.: Ontology-Guided Knowledge Discovery in Databases. In: 1st ACM Int. Conf. on Knowledge Capture, Victoria, Canada, October 2001, pp. 123–130 (2001)Google Scholar
  26. 26.
    Sarawagi, S., Nagaralu, S.H.: Data Mining Models as Services on the Internet. ACM SIGKDD Explorations 2(1), 24–28 (2000)CrossRefGoogle Scholar
  27. 27.
    Shearer, C.: The CRISP-DM Model: The new Blueprint for Data Mining. Jour. of Data Warehousing 5(4) (Fall 2000)Google Scholar
  28. 28.
    Talia, D.: The Open Grid Services Architecture: Where the Grid Meets the Web. IEEE Internet Computing 6(6), 67–71 (2002)CrossRefGoogle Scholar
  29. 29.
    Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)zbMATHGoogle Scholar
  30. 30.
    Varde, A., Rundensteiner, E., Ruiz, C., Maniruzzaman, M., Sisson, R.: Data Mining Over Graphical Results of Experiments With Domain Semantics. In: ACM 2nd Internationa Conference on Intelligent Computing and Information Systems, Cairo, Egypt, March 5-7 (2005)Google Scholar
  31. 31.
    Verschelde, J., Casella Dos Santos, M., Deray, T., Smith, B., Ceusters, W.: Ontology-Assisted Database Integration to Support Natural Language Processing and Biomedical Data Mining. Journal of Integrative Bioinformatics (January 2004)Google Scholar
  32. 32.
    Wang, B., McKay, R., Abbass, H., Barlow, M.: A Comparative Study for Domain Ontology Guided Feature Extraction. In: 26th Australasian Computer Science Conference, Adelaide, Australia, pp. 69–78 (2003)Google Scholar
  33. 33.
    Li, Y., Lu, Z.: Ontology-Based Universal Knowledge Grid: Enabling Knowledge Discovery and Integration on the Grid. In: IEEE International Conference on Services Computing, pp. 557–560 (September 2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Claudia Diamantini
    • 1
  • Domenico Potena
    • 1
  • Maurizio Panti
    • 1
  1. 1.Dipartimento di Ingegneria Informatica, Gestionale e dell’AutomazioneUniversità Politecnica delle MarcheAnconaItaly

Personalised recommendations