Advertisement

Semantic Integration of Tree-Structured Data Using Dimension Graphs

  • Theodore Dalamagas
  • Dimitri Theodoratos
  • Antonis Koufopoulos
  • I-Ting Liu
Conference paper
  • 705 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3730)

Abstract

Nowadays, huge volumes of Web data are organized or exported in tree-structured form. Popular examples of such structures are product catalogs of e-market stores, taxonomies of thematic categories, XML data encodings, etc. Even for a single knowledge domain, name mismatches, structural differences and structural inconsistencies raise difficulties when many data sources need to be integrated and queried in a uniform way. In this paper, we present a method for semantically integrating tree-structured data. We introduce dimensions which are sets of semantically related nodes in tree structures. Based on dimensions, we suggest dimension graphs. Dimension graphs can be automatically extracted from trees and abstract their structural information. They are semantically rich constructs that provide query guidance to pose queries, assist query evaluation and support integration of tree-structured data. We design a query language to query tree-structured data. The language allows full, partial or no specification of the structure of the underlying tree-structured data used to issue queries. Thus, queries in our language are not restricted by the structure of the trees. We provide necessary and sufficient conditions for checking query satisfiability and we present a technique for evaluating satisfiable queries. Finally, we conducted several experiments to compare our method for integrating tree-structured data with one that does not exploit dimension graphs. Our results demonstrate the superiority of our approach.

Keywords

Query Evaluation Dimension Graph Precedence Relationship Semantic Integration Path Expression 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Exchangeable Faceted Metadata Language, XFML (2003), http://www.xfml.org/
  2. 2.
    XML Topic Maps (XTM) (2001), http://www.topicmaps.org
  3. 3.
    World Wide Web Consortium site (W3C), http://www.w3c.org
  4. 4.
    XML Path Language (XPath). World Wide Web Consortium site, W3C (2003-2005), http://www.w3c.org/TR/xpath20/
  5. 5.
    XML Query (XQuery). World Wide Web Consortium site (W3C), The Architecture Domain (2003-2005), http://www.w3.org/XML/Query
  6. 6.
    Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web. From Relations to Semistructured Data and XML. Morgan Kaufmann Publishers, San Francisco (2000)Google Scholar
  7. 7.
    Amann, B., Beeri, C., Fundulaki, I., Scholl, M.: Ontology-based integration of XML web resources. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, p. 117. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  8. 8.
    Amer-Yahia, S., Cho, S., Srivastava, D.: Tree pattern relaxation. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, p. 496. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  9. 9.
    Behrens, R.: A grammar based model for XML schema integration. In: Jeffery, K., Lings, B. (eds.) BNCOD 2000. LNCS, vol. 1832, p. 172. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  10. 10.
    Bergamaschi, S., Guerra, F., Vincini, M.: A data integration framework for e-commerce product classification. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, p. 379. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  11. 11.
    Buneman, P., Davidson, S.B., Fernandez, M.F., Suciu, D.: Adding structure to unstructured data. In: Afrati, F.N., Kolaitis, P.G. (eds.) ICDT 1997. LNCS, vol. 1186. Springer, Heidelberg (1996)Google Scholar
  12. 12.
    Camillo, S.D., Heuser, C.A., dos Santos Mello, R.: Querying heterogeneous XML sources through a conceptual schema. In: Song, I.-Y., Liddle, S.W., Ling, T.-W., Scheuermann, P. (eds.) ER 2003. LNCS, vol. 2813, pp. 186–199. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  13. 13.
    Chaudhri, A.B., Rashid, A., Zicari, R.: XML Data Management. Addison-Wesley, Reading (2003)Google Scholar
  14. 14.
    Christophides, V., Cluet, S., Simeon, J.: On wrapping query languages and efficient XML integration. In: Proceedings of the International Conference on Management of Data (ACM SIGMOD 2000), Dallas, Texas, USA (May 2000)Google Scholar
  15. 15.
    Cluet, S., Veltri, P., Vodislav, D.: Views in a large scale XML repository. In: Proceedings of the 27th International Conference on Very Large Data Bases (VLDB 2001), Rome, Italy (September 2001)Google Scholar
  16. 16.
    dos Santos Mello, R., Heuser, C.A.: A bottom-up approach for integration of XML sources. In: Proceedings of the International Workshop on Information Integration on the Web (WIIW 2001), Rio de Janeiro, Brazil (April 2001)Google Scholar
  17. 17.
    Garofalakis, M., Gionis, A., Rastogi, R., Seshadri, S., Shim, K.: XTRACT: A system for extracting document type descriptors from XML documents. In: Proceedings of the International Conference on Management of Data (ACM SIGMOD 2000), Dallas, Texas, USA (May 2000)Google Scholar
  18. 18.
    Goldman, R., Widom, J.: DataGuides: Enabling query formulation and optimization in semistructured databases. In: Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB 1997), Athens, Greece (August 1997)Google Scholar
  19. 19.
    Guha, S., Jagadish, H.V., Koudas, N., Srivastava, D., Yu, T.: Approximate XML joins. In: Proceedings of the International Conference on Management of Data (ACM SIGMOD 2002), Madison, USA (June 2002)Google Scholar
  20. 20.
    Halevy, A.: Data integration: a status report. In: Proceedings of the Datenbanksysteme fur Business, Technologie und Web, BTW 2003 (2003)Google Scholar
  21. 21.
    Hull, R.: Managing semantic heterogeneity in databases: A theoretical perspective. In: Proceedings of the 16th Symposium on Principles of Database Systems (ACM PODS 1997), Tucson, Arizona (May 1997)Google Scholar
  22. 22.
    Kim, D., Kim, J., Lee, S.-G.: Catalog integration for electronic commerce through category-hierarchy merging technique. In: Proceedings of the 12th International Workshop on Research Issues in Data Engineering (RIDE 2002), San Jose, USA (March 2002)Google Scholar
  23. 23.
    Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: XClust: Clustering XML schemas for effective integration. In: Proceedings of the 11th International Conference on Information and Knowledge Management (CIKM 2002), McLean, Virginia, USA (November 2002)Google Scholar
  24. 24.
    Lenzerini, M.: Data integration: A theoretical perspective. In: Proceedings of the 21st Symposium on Principles of Database Systems (ACM PODS 2002), Madison, Wisconsin, USA (Jun 2002)Google Scholar
  25. 25.
    Manolescu, I., Florescu, D., Kossmann, D.: Answering XML queries over heterogeneous data sources. In: Proceedings of the 27th International Conference on Very Large Data Bases (VLDB 2001), Rome, Italy (September 2001)Google Scholar
  26. 26.
    Marron, P.J., Lausen, G., Weber, M.: Catalog integration made easy. In: Proceedings of the 19th International Conference on Data Engineering (ICDE 2003) (poster), Bangalore, India (March 2003)Google Scholar
  27. 27.
    Polyzotis, N., Garofalakis, M.: Statistical synopses for graph-structured XML databases. In: Proceedings of the International Conference on Management of Data (ACM SIGMOD 2002), Madison, USA (June 2002)Google Scholar
  28. 28.
    Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)zbMATHCrossRefGoogle Scholar
  29. 29.
    Ram, S., Ramesh, V.: Management of Heterogeneous and Autonomous Database Systems. Morgan Kaufmann Publishers, San Francisco (1999)Google Scholar
  30. 30.
    Theodoratos, D., Dalamagas, T.: Querying tree-structured data using dimension graphs. In: Pastor, Ó., Falcão e Cunha, J. (eds.) CAiSE 2005. LNCS, vol. 3520, pp. 201–215. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  31. 31.
    Tzitzikas, Y., Spyratos, N., Constantopoulos, P., Analyti, A.: Extended faceted taxonomies for web catalogs. In: Proceedings of the 3rd International Conference on Web Information Systems Engineering (WISE 2002), Grand Hyatt, Singapore (December 2002)Google Scholar
  32. 32.
    Widom, J.: Research problems in data warehousing. In: Proceedings of the 4th International Conference on Information and Knowledge Management (CIKM 2002), Baltimore, Maryland, USA (December 1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Theodore Dalamagas
    • 1
  • Dimitri Theodoratos
    • 2
  • Antonis Koufopoulos
    • 1
  • I-Ting Liu
    • 2
  1. 1.School of Electr. and Comp. EngineeringNational Technical University of AthensAthensUSA
  2. 2.Department of Computer ScienceNew Jersey Institute of TechnologyNewarkUSA

Personalised recommendations