Web Crawling

Olston, Christopher; Najork, Marc

doi:10.1561/1500000017

Article navigation

Research Article| February 12 2010

Web Crawling

Christopher Olston;

Christopher Olston

Yahoo! Research

, 701 First Avenue, Sunnyvale, CA, 94089,

USA

Search for other works by this author on:

This Site

PubMed

Google Scholar

Marc Najork

Microsoft Research

, 1065 La Avenida, Mountain View, CA, 94043,

USA

Search for other works by this author on:

This Site

PubMed

Google Scholar

Author & Article Information

Online ISSN: 1554-0677

Print ISSN: 1554-0669

2010

C. Olston and M. Najork et al.

Licensed re-use rights only

Foundations and Trends in Information Retrieval (2010) 4 (3): 175–246.

https://doi.org/10.1561/1500000017

This is a survey of the science and practice of web crawling. While at first glance web crawling may appear to be merely an application of breadth-first-search, the truth is that there are many challenges ranging from systems concerns such as managing very large data structures to theoretical questions such as how often to revisit evolving content sources. This survey outlines the fundamental challenges and describes the state-of-the-art models and solutions. It also highlights avenues for future work.

2010

C. Olston and M. Najork et al.

Licensed re-use rights only

You do not currently have access to this content.

Don't already have an account? Register

Web Crawling

Email Alerts

Cited By

Web Crawling

Sign in

Client Account

ICE Member Sign In

Email Alerts

Suggested Reading

Related Chapters

Recommended for you

Cited By

Sharing Unavailable