Finding Significant Web Pages with Lower Ranks by Pseudo-Clique Search
- 544 Downloads
In this paper, we discuss a method of finding useful clusters of web pages which are significant in the sense that their contents are similar or closely related to ones of higher-ranked pages. Since we are usually careless of pages with lower ranks, they are unconditionally discarded even if their contents are similar to some pages with high ranks. We try to extract such hidden pages together with significant higher-ranked pages as a cluster.
In order to obtain such clusters, we first extract semantic correlations among terms by applying Singular Value Decomposition(SVD) to the term-document matrix generated from a corpus w.r.t. a specific topic. Based on the correlations, we can evaluate potential similarities among web pages from which we try to obtain clusters. The set of web pages is represented as a weighted graph G based on the similarities and their ranks. Our clusters can be found as pseudo-cliques in G. We present an algorithm for finding Top-N weighted pseudo-cliques. Our experimental result shows that quite valuable clusters can be actually extracted according to our method.
Unable to display preview. Download preview PDF.
- 1.Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web (1999), http://dbpubs.stanford.edu/pub/1999-66
- 3.Kita, K., Tsuda, K., Shishibori, M.: Information Retrieval Algorithms. Kyoritsu Shuppan (2002) (in Japanese)Google Scholar
- 6.Satoh, K.: A Method for Generating Data Abstraction Based on Optimal Clique Search, Master’s Thesis, Graduate School of Eng., Hokkaido Univ. (March 2003) (in Japanese)Google Scholar
- 7.Masuda, S.: Analysis of Ascidian Gene Expression Data by Clique Search, Master’s Thesis, Graduate School of Eng., Hokkaido Univ. (March 2005) (in Japanese)Google Scholar
- 8.Shi, B.: Top-N Clique Search of Web Pages, Master’s Thesis, Graduate School of Eng., Hokkaido Univ. (March 2005) (in Japanese)Google Scholar
- 10.Okubo, Y., Haraguchi, M.: Finding Top-N Pseudo-Cliques in Simple Graph. In: Proceedings of the 9th World Multiconference on Systemics, Cybernetics and Informatics - WMSCI 2005, vol. III, pp. 215–220 (2005)Google Scholar