Estimating the Size and Content of the World Wide Web

D. Rhodenizer and A. Trudel (Canada)

Keywords

Applications and case studies.

Abstract

How big is the World Wide Web? Is Apache the most widely used hosting software? How predominant is English on the Web? Does pornography represent a sizeable portion of the Web's content? To answer these questions, we use a population sampling technique from biology called "quadrat counts." Quadrats are used to decompose the IP address space into equal-sized groups comprised of 100 random addresses each. A large number of quadrats are sampled by checking their addresses for the presence of web servers. The results of the quadrat sampling are then used to make predictions about the overall size and content of the Web.

Important Links:



Go Back