Data Sparsity Issues in the Collaborative Filtering Framework

  • Miha Grčar
  • Dunja Mladenič
  • Blaž Fortuna
  • Marko Grobelnik
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4198)


With the amount of available information on the Web growing rapidly with each day, the need to automatically filter the information in order to ensure greater user efficiency has emerged. Within the fields of user profiling and Web personalization several popular content filtering techniques have been developed. In this chapter we present one of such techniques – collaborative filtering. Apart from giving an overview of collaborative filtering approaches, we present the experimental results of confronting the k-Nearest Neighbor (kNN) algorithm with Support Vector Machine (SVM) in the collaborative filtering framework using datasets with different properties. While the k-Nearest Neighbor algorithm is usually used for collaborative filtering tasks, Support Vector Machine is considered a state-of-the-art classification algorithm. Since collaborative filtering can also be interpreted as a classification/regression task, virtually any supervised learning algorithm (such as SVM) can also be applied. Experiments were performed on two standard, publicly available datasets and, on the other hand, on a real-life corporate dataset that does not fit the profile of ideal data for collaborative filtering. We conclude that the quality of collaborative filtering recommendations is highly dependent on the sparsity of available data. Furthermore, we show that kNN is dominant on datasets with relatively low sparsity while SVM-based approaches may perform better on highly sparse data.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aggarwal, C.C., Wolf, J.L., Wu, K.-L., Yu, P.S.: Horting hatches an egg: A new graph-theoretic approach to collaborative filtering. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (1999)Google Scholar
  2. 2.
    Baldi, P., Frasconi, P., Smyth, P.: Modeling the Internet and the Web: Probabilistic Methods and Algorithms. Wiley, New York (2003)Google Scholar
  3. 3.
    Billsus, D., Pazzani, M.J.: Learning collaborative information filers. In: Proceedings of the 15th International Conference on Machine Learning (1998)Google Scholar
  4. 4.
    Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (1998)Google Scholar
  5. 5.
    Chang, C.-C., Lin, C.-J.: LibSvm: A Library for Support Vector Machines (2001), Software available at
  6. 6.
    Chickering, D.M., Heckerman, D., Meek, C.: A bayesian approach to learning bayesian networks with local structure. In: Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence (1997)Google Scholar
  7. 7.
    Claypool, M., Le, P., Wased, M., Brown, D.: Implicit interest indicators. In: Proceedings of ACM 2001 Intelligent User Interfaces Conference (2001)Google Scholar
  8. 8.
    Deerwester, S., Dumais, S.T., Harshman, R.: Indexing by latent semantic analysis. Journal of the Society for Information Science 41(6), 391–407 (1990)CrossRefGoogle Scholar
  9. 9.
    Goldberg, K., Roeder, T., Gupta, D., Perkins, C.: Eigentaste: A constant time collaborative filtering algorithm. Information Retrieval (4), 133–151 (2001)Google Scholar
  10. 10.
    Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems 22(1), 5–53 (2004)CrossRefGoogle Scholar
  11. 11.
    Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (1999)Google Scholar
  12. 12.
    Hofmann, T.: Latent semantic models for collaborative filtering. ACM Transactions on Information Systems 22(1), 89–115 (2004)CrossRefGoogle Scholar
  13. 13.
    Konstan, J.A., Miller, B.N., Maltz, D., Herlocker, J.L., Gordon, L.R., Riedl, J.: Grouplens: Applying collaborative filtering to usenet news. Communications of the ACM 40(3), 77–87 (1997)CrossRefGoogle Scholar
  14. 14.
    Melville, P., Mooney, R.J., Nagarajan, R.: Content-boosted collaborative filtering for improved recommendations. In: Proceedings of the 18th National Conference on Artificial Intelligence (2002)Google Scholar
  15. 15.
    Resnick, P., Iaocvou, N., Suchak, M., Bergstrom, P., Riedl, J.: Grouplens: An open architecture for collaborative filtering for netnews. In: Proceedings of ACM 1994 Conference on Computer Supported Cooperative Work, pp. 175–186 (1994)Google Scholar
  16. 16.
    Rosenstein, M., Lochbaum, C.: What is actually taking place on web sites: Ecommerce lessons from web server logs. In: Proceedings of ACM 2000 Conference on Electronic Commerce (2000)Google Scholar
  17. 17.
    Sarwar, B., Karypis, G., Konstan, J., Reidl, J.: Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th International Conference on World Wide Web (2001)Google Scholar
  18. 18.
    Yu, K., Xu, X., Ester, M., Kriegel, H.-P.: Selecting relevant instances for efficient and accurate collaborative filtering. In: Proceedings of the 10th International Conference on Information and Knowledge Management (2001)Google Scholar
  19. 19.
    Zeng, C., Xing, C.-X., Zhou, L.-Z.: Similarity measure and instance selection for collaborative filtering. In: Proceedings of the 12th International World Wide Web Conference (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Miha Grčar
    • 1
  • Dunja Mladenič
    • 1
  • Blaž Fortuna
    • 1
  • Marko Grobelnik
    • 1
  1. 1.Jožef Stefan InstituteLjubljanaSlovenia

Personalised recommendations