On Clustering Techniques for Change Diagnosis in Data Streams

  • Charu C. Aggarwal
  • Philip S. Yu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4198)


In recent years, data streams have become ubiquitous in a variety of applications because of advances in hardware technology. Since data streams may be generated by applications which are time-changing in nature, it is often desirable to explore the underlying changing trends in the data. In this paper, we will explore and survey some of our recent methods for change detection. In particular, we will study methods for change detection which use clustering in order to provide a concise understanding of the underlying trends. We discuss our recent techniques which use micro-clustering in order to diagnose the changes in the underlying data. We also discuss the extension of this method to text and categorical data sets as well community detection in graph data streams.


Data Stream Cluster Technique Time Stamp Community Detection Interaction Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aggarwal, C.C.: A Framework for Diagnosing Changes in Evolving Data Streams. In: ACM SIGMOD Conference, pp. 575–586 (2003)Google Scholar
  2. 2.
    Aggarwal, C.C.: An Intuitive Frame work for Understanding Changes in Evolving Data Streams. In: ICDE Conference (2002)Google Scholar
  3. 3.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.: A Framework for Clustering Evolving Data Streams. In: VLDB Conference, pp. 81–92 (2003)Google Scholar
  4. 4.
    Aggarwal, C.C., Yu, P.S.: Online Analysis of Community Evolution in Data Streams. In: ACM SIAM Data Mining Conference (2006)Google Scholar
  5. 5.
    Aggarwal, C.C., Yu, P.S.: A Framework for Clustering Massive Text and Categorical Data Streams. In: ACM SIAM Data Mining Conference (2006)Google Scholar
  6. 6.
    Aggarwal, C., Han, J., Wang, J., Yu, P.: On-Demand Classification of Data Streams. In: ACM KDD Conference (2004)Google Scholar
  7. 7.
    Ahuja, R., Magnanti, T., Orlin, J.: Network Flows: Theory, Algorithms and Applications. Prentice Hall, Englewood Cliffs (1992)Google Scholar
  8. 8.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream Systems. In: ACM PODS Conference, pp. 1–16 (2002)Google Scholar
  9. 9.
    Chawathe, S., Garcia-Molina, H.: Meaningful Change Detection in Structured Data. In: ACM SIGMOD Conference Proceedings (1997)Google Scholar
  10. 10.
    Cortes, C., Pregibon, D., Volinsky, C.: Communities of interest. In: Hoffmann, F., Adams, N., Fisher, D., Guimarães, G., Hand, D.J. (eds.) IDA 2001. LNCS, vol. 2189, p. 105. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  11. 11.
    Cortes, C., Pregibon, D., Volinsky, C.: Computational Methods for Dynamic Graphs. Journal of Computational and Graphical Statistics 12, 950–970 (2003)CrossRefMathSciNetGoogle Scholar
  12. 12.
    Dasu, T., Krishnan, S., Venkatasubramaniam, S.: YiK.: An Information-Theoretic Approach to Detecting Changes in Multi-dimensional data Streams. Duke University Technical Report CS-2005-06 (2005)Google Scholar
  13. 13.
    Domingos, P., Hulten, G.: Mining High-Speed Data Streams. In: ACM SIGKDD Conference (2000)Google Scholar
  14. 14.
    Ganti, V., Gehrke, J., Ramakrishnan, R.: A Frame work for Measuring Changes in Data Characteristics. In: ACM PODS Conference, pp. 126–137 (1999)Google Scholar
  15. 15.
    Ganti, V., Gehrke, J., Ramakrishnan, R.: Mining and Monitoring Evolving Data. In: IEEE ICDE Conference, pp. 439–448 (2000)Google Scholar
  16. 16.
    Ganti, V., Gehrke, J., Ramakrishnan, R.: CACTUS-Clustering Categorical Data Using Summaries. In: ACMKDD Conference, pp. 73–83 (1999)Google Scholar
  17. 17.
    Gibson, D., Kleinberg, J., Raghavan, P.: Inferring Web Communities from Link Topology. In: Proceedings of the 9th ACM Conference on Hypertext and Hypermedia (1998)Google Scholar
  18. 18.
    Hulten, G., Spencer, L., Domingos, P.: Mining Time Changing Data Streams. In: ACMKDD Conference (2001)Google Scholar
  19. 19.
    Imafuji, N., Kitsuregawa, M.: Finding a Web Community by Maximum Flow Algorithm with HITS Score Based Capacity. In: DASFAA, pp.101–106 (2003)Google Scholar
  20. 20.
    Kempe, D., Kleinberg, J., Tardos, E.: Maximizing the Spread of Influence Through a Social Network. In: ACMKDD Conference (2003)Google Scholar
  21. 21.
    Kumar, R., Novak, J., Raghavan, P., Tomkins, A.: On the Bursty Evolution of Blogspace. In: Proceedings of the WWW Conference (2003)Google Scholar
  22. 22.
    Mei, Q., Zhai, C.: Discovering evolutionary the me patterns from text: an exploration of temporal text mining. In: ACMKDD Conference, pp. 198–207 (2005)Google Scholar
  23. 23.
    Nasraoui, O., Cardona, C., Rojas, C., Gonzlez, F.: TECNO-STREAMS: Tracking Evolving Clusters in Noisy Data Streams with a Scalable Immune System Learning Model. In: ICDM Conference, pp. 235–242 (2003)Google Scholar
  24. 24.
    Rajagopalan, S., Kumar, R., Raghavan, P., Tomkins, A.: Trawling the Web for emergingcy ber-communities. In: Proceedings of the 8th WWW conference (1999)Google Scholar
  25. 25.
    Toyoda, M., Kitsuregawa, M.: Extracting evolution of web communities from aseries of web archives. Hypertext, 28–37 (2003)Google Scholar
  26. 26.
    Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An Efficient Data Cluster-ing Method for Very Large Databases. In: ACMSIGMOD Conference, pp. 103–114 (1996)Google Scholar
  27. 27.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Charu C. Aggarwal
    • 1
  • Philip S. Yu
    • 1
  1. 1.IBM T. J. Watson Research CenterHawthorneUSA

Personalised recommendations