Discovering Communities in Linked Data by Multi-view Clustering

  • Isabel Drost
  • Steffen Bickel
  • Tobias Scheffer
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


We consider the problem of finding communities in large linked networks such as web structures or citation networks. We review similarity measures for linked objects and discuss the k-Means and EM algorithms, based on text similarity, bibliographic coupling, and co-citation strength. We study the utilization of the principle of multi-view learning to combine these similarity measures. We explore the clustering algorithms experimentally using web pages and the Cite-Seer repository of research papers and find that multi-view clustering effectively combines link-based and intrinsic similarity.


Citation Analysis Citation Network Cluster Quality Bibliographic Coupling True Class Label 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. ALBERICH, R., MIRO-JULIA, J., & ROSSELLÓ, F. (2002): Marvel universe looks almost like a real social network (Preprint). arXiv id 0202174.Google Scholar
  2. BICKEL, S., & SCHEFFER, T. (2004): Multi-view clustering. IEEE International Conference on Data Mining.Google Scholar
  3. DASGUPTA, S., LITTMAN, M.L., & McALLESTER, D. (2002): Pac generalization bounds for co-training. Advances in Neural Information Processing Systems 14 (pp. 375–382). Cambridge, MA: MIT Press.Google Scholar
  4. DEMPSTER, A., LAIRD, N., & RUBIN, D. (1977): Maximum likelihood from incomplete data via the EM algorithm. Journ. of Royal Stat. Soc. B, 39.Google Scholar
  5. GARFIELD, E. (1972): Citation analysis as a tool in journal evaluation. Science, 178, 471–479.Google Scholar
  6. GETOOR, L. (2003): Link mining: A new data mining challenge. SIGKDD Exploration 5.Google Scholar
  7. GIBSON, D., KLEINBERG, J.M., & RAGHAVAN, P. (1998): Inferring web communities from link topology. UK Conference on Hypertext (pp. 225–234).Google Scholar
  8. HE, X., DING, C.H.Q., ZHA, H., & SIMON, H.D. (2001): Automatic topic identification using webpage clustering. ICDM (pp. 195–202).Google Scholar
  9. HOPCROFT, J., KHAN, O., & SELMAN, B. (2003): Tracking evolving communities in large linked networks. Proceedings of the SIGKDD International Conference on Knowledge Discovery and Data Mining.Google Scholar
  10. KAUTZ, H., SELMAN, B., & SHAH, M. (1997): The hidden web. AI Magazine, 18, 27–36.Google Scholar
  11. LILJEROS, F., EDLING, C., AMARAL, L., STANLEY, H., & ABERG, Y. (2001): The web of human sexual contacts. Nature, 411, 907–908.CrossRefGoogle Scholar
  12. LU, Q., & GETOOR, L. (2003): Link-based text classification. IJCAI Workshop on Text Mining and Link Analysis, Acapulco, MX.Google Scholar
  13. MODHA, D.S., & Spangler, W.S. (2000): Clustering hypertext with applications to web searching. ACM Conference on Hypertext (pp. 143–152).Google Scholar
  14. REDNER, S. (1998): How popular is your paper? an empirical study of the citation distribution. European Physical Journal B, 4, 131–134.Google Scholar
  15. WANG, Y., & KITSUREGAWA, M. (2001): Link based clustering of Web search results. Lecture Notes in Computer Science, 2118.Google Scholar
  16. WATTS, D., & STROGATZ, S. (1998): Collective dynamics of small-world networks. Nature, 393, 440–442.CrossRefGoogle Scholar
  17. WHITE, H. (2003): Pathfinder networks and author cocitation analysis: a remapping of paradigmatic information scientists. Journal of the American Society for Information Science and Technology, 54, 423–434.Google Scholar
  18. WHITE, H., & McCAIN, K. (1989): Bibliometrics. Annual Review of Information Science and Technology, 24, 119–186.Google Scholar

Copyright information

© Springer Berlin · Heidelberg 2006

Authors and Affiliations

  • Isabel Drost
    • 1
  • Steffen Bickel
    • 1
  • Tobias Scheffer
    • 1
  1. 1.Institut für InformatikHumboldt-Universität zu BerlinBerlinGermany

Personalised recommendations