On The Evolution of Clusters of Near-Duplicate Web Pages
Keywords:
We characterization, web evolution, clusters, mirrors, mirror detectionAbstract
This paper expands on a 1997 study of the amount and distribution of near duplicate pages on the World Wide Web. We downloaded a set of 150 million web pages on weekly basis over the span of 11 weeks. We than determined which of these pages are near duplicates of one another, and tracked how clusters of near duplicate documents evolved over time.
Downloads
Download data is not yet available.
Downloads
Published
2004-06-11
How to Cite
Fetterly, D., Manasse, M. ., & Najork, M. . (2004). On The Evolution of Clusters of Near-Duplicate Web Pages. Journal of Web Engineering, 2(4), 228–246. Retrieved from https://journals.riverpublishers.com/index.php/JWE/article/view/4355
Issue
Section
Articles