Advertisement

Challenges in Crawling the Web

  • Hector Garcia-Molina
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2712)

Abstract

The World Wide Web, or simply the Web, is rapidly becoming the world’s collective information store, containing everything from news, to entertainment, to personal communications, to product descriptions. This world information store is distributed across millions of computers, but it is often important to gather significant parts of it at a single site. One reason is to build content indices, such as Google. Another reason is to mine the cached Web, looking for trends or data correlations. A third reason for gathering a Web copy is to create a historical record for Web sites that are ephemeral or changing.

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Hector Garcia-Molina
    • 1
  1. 1.Computer Science DepartmentStanford UniversityStanford

Personalised recommendations