Challenges in Crawling the Web
- 158 Downloads
The World Wide Web, or simply the Web, is rapidly becoming the world’s collective information store, containing everything from news, to entertainment, to personal communications, to product descriptions. This world information store is distributed across millions of computers, but it is often important to gather significant parts of it at a single site. One reason is to build content indices, such as Google. Another reason is to mine the cached Web, looking for trends or data correlations. A third reason for gathering a Web copy is to create a historical record for Web sites that are ephemeral or changing.