Atrax, a distributed web crawler
Channel:
Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=i5qLt0ShJSg
This talk describes Atrax, a distributed and very fast web crawler. Running Atrax on a cluster of four DS20E Alpha servers saturates our internet connection. During a recent crawl, we were able to download about 115 Mbits/sec, or about 50 million web pages per day, over a sustained period of time. Atrax has been used to collect the raw data for numerous web studies performed at Compaq Research.
Other Videos By Microsoft Research
Tags:
microsoft research