Advertisement

WebVigiL: User Profile-Based Change Detection for HTML/XML Documents

  • N. Pandrangi
  • J. Jacob
  • A. Sanka
  • S. Chakravarthy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2712)

Abstract

With the exponential increase of information on the web, the emphasis has shifted from mere viewing of information to efficient retrieval and notification of selective information. Currently, users have to poll the pages manually to check for changes of interest, resulting in waste of resources and associated high cost. Hence, an efficient and effective change detection and notification mechanism is needed. WebVigiL, a general-purpose, active capability-based information monitoring and notification system, handles specification, management, and propagation of customized changes as requested by a user. The emphasis of change detection in WebVigiL is to detect customized changes on the document, based on user intent. In this paper, we propose two different algorithms to handle change detection to contents of semi-structured and unstructured documents. Though the approach taken is general, we will explain the change detection in the context of HTML (unstructured) and XML (semistructured) documents. We also provide a simple change presentation scheme to display the changes computed. We highlight the change detection in the context of WebVigiL and briefly describe the rest of the system.

Keywords

Change Detection Leaf Node Change Operation Longe Common Subsequence User Intent 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Chakravarthy, S., et al. WebVigiL: An approach to Just-In-Time Information Propagation In Large Network-Centric Environments. in Second International Workshop on Web Dynamics. 2002. Hawaii.Google Scholar
  2. [2]
    Jacob, J., et al., WebVigiL: An approach to Just-In-Time Information Propagation In Large Network-Centric Environments(to be published), in Web Dynamics Book. 2003, Springer-Verlag.Google Scholar
  3. [3]
    Chakravarthy, S., et al., WebVigiL: Architecture and Functionality of a Web Monitoring System (submitted). http://itlab.uta.edu/sharma/Projects/WebVigil/files/WVFetch.pdf.Google Scholar
  4. [4]
    J.W. Hunt and M.D. Mcllroy, An algorithm for efficient file comparison. 1975, Bell Laboratories: Murray Hill, N.J.Google Scholar
  5. [5]
    E. Myers, An O(ND) difference algorithm and its variations. Algorithmica, 1986. 1: p. 251–266.zbMATHCrossRefMathSciNetGoogle Scholar
  6. [6]
    S. Wu, U. Manber, and E. Myers, An O(NP) sequence comparision algorithm. Information Processing Letters, 1990. 35: p. 317–323.zbMATHCrossRefMathSciNetGoogle Scholar
  7. [7]
    Hirschberg, D., Algorithms for the longest common subsequence problem. Journal of the ACM, 1977: p. 664–675.Google Scholar
  8. [8]
    Douglis, F., et al., The AT&T Internet Difference Engine: Tracking and Viewing Changes on the Web, in World Wide Web. 1998, Baltzer Science Publishers. p. 27–44.Google Scholar
  9. [9]
    Saeyor, S. and M. Ishizuka. WebBeholder: A Revolution in Tracking and Viewing Changes on The Web by Agent Community. in WebNet98. 1998.Google Scholar
  10. [10]
    Baker, S.B. A theory of parametrized pattern matching:algorithms and applications. in Proceedings of the 25th Annual ACM Symposium on Theory of Computing. 1993.Google Scholar
  11. [11]
    Balazinska, M., et al. Advanced clone-analysis to support object-oriented system refactoring. in Seventh Working Conference on Reverse Engineering. 2000.Google Scholar
  12. [12]
    Lucca, G.D., et al. Clone Analysis in the Web Era: an Approach to Identify Cloned Web Pages. in Seventh IEEE Workshop on Empirical Studies of Software Maintenance. 2001. Florence, Italy.Google Scholar
  13. [13]
    Ulam, S.M. Some Combinatorial Problems Studied Experimentally on Computing Machines. in Zaremba S.K., Applications of Number Theory to Numerical Analysis. 1972: Academic Press.Google Scholar
  14. [14]
    K. Zhang and D. Shasha, Simple Fast Algorithms for the Editing Distance between Trees and Related Problems. SIAM Journal of Computing, 1989. 18(6): p. 1245–1262.zbMATHCrossRefMathSciNetGoogle Scholar
  15. [15]
    K. Zhang, R. Statman, and D. Shasha, On the Editing Distance between Unordered Labeled Trees. Information Processing Letters, 1992. 42: p. 133–139.zbMATHCrossRefMathSciNetGoogle Scholar
  16. [16]
    S. Chawathe, et al. Change detection in hierarchically structured information. in Proceedings of the ACM SIGMOD International Conference on Management of Data. 1996. Montréal, Québec.Google Scholar
  17. [17]
    Y. Wang, D.De Witt, and J. Cai, X-Diff: An Effective Change Detection Algorithm for XML Documents. 2001, Technical Report, University of Wisconsin.Google Scholar
  18. [18]
    G. Cobena, S. Abiteboul, and A. Marian, Detecting Changes in XML Documents. Data Engineering, 2002.Google Scholar
  19. [19]
    F.P. Curbera and D.A. Epstein, Fast Difference and Update of XML Documents. XTech’99, 1999.Google Scholar
  20. [20]
    Chen, Y.-F. and E. Koutsofios. WebCiao: A Website Visualization and Tracking System. in WebNet97. 1997.Google Scholar
  21. [21]
    Extensible Markup Language(XML)., World Wide Web Consor tium, http://www.w3.org/XML/.Google Scholar
  22. [22]
    S. Abiteboul, P. Buneman, and D. Suciu, Data on the Web: From Relations to Semistructured Data and XML. 1999: Morgan Kaufmann.Google Scholar
  23. [23]
    HTML-Parser, http://www.quiotix.com/downloads/html-parser/.Google Scholar
  24. [24]
    Liu, L., C. Pu, and W. Tang. WebCQ: Detecting and Delivering Information Changes on the Web. in Proceedings of International Conference on Information and Knowledge Management (CIKM). 2000. Washington D.C: ACM Press.Google Scholar
  25. [25]
    Java1.3, http://java.sun.com/j2se/1.3/docs/api/.Google Scholar
  26. [26]
    Document Object Model(DOM)., http://www.w3.org/DOM/.Google Scholar
  27. [27]
    Xerces-J, http://xml.apache.org/xerces2-j/index.html.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • N. Pandrangi
    • 1
  • J. Jacob
    • 1
  • A. Sanka
    • 1
  • S. Chakravarthy
    • 1
  1. 1.Information Technology Laboratory and Computer Science and Engineering DepartmentThe University of Texas at ArlingtonArlington

Personalised recommendations