Atrax, a distributed web crawler

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=i5qLt0ShJSg



Duration: 1:05:39
4,144 views
16


This talk describes Atrax, a distributed and very fast web crawler. Running Atrax on a cluster of four DS20E Alpha servers saturates our internet connection. During a recent crawl, we were able to download about 115 Mbits/sec, or about 50 million web pages per day, over a sustained period of time. Atrax has been used to collect the raw data for numerous web studies performed at Compaq Research.







Tags:
microsoft research