ON THE IMAGE CONTENT OF A WEB SEGMENT: CHILE AS A CASE STUDY

Authors

  • A. JAIMES Center for Web Research, Department of Computer Science, Universidad de Chile, CHILE
  • J. RUIZ-DEL SOLAR 2Department of Electrical Engineering, Universidad de Chile, CHILE
  • R. VERSCHAE 1Center for Web Research, Department of Computer Science, Universidad de Chile, CHILE 2Department of Electrical Engineering, Universidad de Chile, CHILE
  • D. YAKSIC Center for Web Research, Department of Computer Science, Universidad de Chile, CHILE and Department of Electrical Engineering, Universidad de Chile, CHILE
  • R. BAEZA YATES Center for Web Research, Department of Computer Science, Universidad de Chile, CHILE
  • C. CASTILLO Center for Web Research, Department of Computer Science, Universidad de Chile, CHILE
  • E. DAVIS Center for Web Research, Department of Computer Science, Universidad de Chile, CHILE

Keywords:

Web characterization, Web image analysis

Abstract

We propose a methodology to characterize the image contents of a web segment, and we present an analysis of the contents of a segment of the Chilean web (.CL domain). Our framework uses an efficient web-crawling architecture, standard content-based analysis tools (to extract low-level features such as color, shape and texture), and novel skin and face detection algorithms. In an automated process we start by examining all websites within a domain (e.g., .cl websites), obtaining links to images, and downloading a large number of the images (in all of our experiments approx. 383,000 images that correspond to about 35 billion pixels). Once the images are downloaded to a local server, our process automatically extracts several low-level visual features (color, texture, shape, etc.). Using novel algorithms we perform skin and face detection. The results of visual feature extraction, skin, and face detection are then used to characterize the contents of a web segment. We tested our methodology on a segment of the Chilean web (.cl), by automatically downloading and processing 183,000 images in 2003 and 200,000 images in 2004. We present some statistics derived from both sets of images, which should be of use to anyone concerned with the image content of the web in Chile. Our study is the first one to use content-based tools to determine the image contents of a given web segment.

 

Downloads

Download data is not yet available.

References

R. Baeza-Yates, J. Ruiz-del-Solar, R. Verschae, C. Castillo, and C. Hurtado, “Content-based Image

Retrieval and Characterization on Specific Web Collections,” Lecture Notes in Computer Science

, Springer, 189 – 198, 2004.

R. Baeza-Yates, and C. Castillo, Balancing collection volume, quality and freshness in a web

crawler, in A. Abraham. J. Ruiz-del-Solar, M. Köppen (Eds.), Soft-Computing Systems: Design,

Management and Applications, Frontiers in Artificial Intelligence and Applications 87, IOS Press,

pp. 565 – 572, 2002.

R. Baeza-Yates, B.J. Poblete, and F. Saint-Jean, Evolución de la Web Chilena 2001-2002

(Evolution of the Chilean Web 2001 - 2002), Center for Web Research, Department of Computer

Science, Universidad de Chile, January 2003 (in Spanish).

R. Baeza-Yates and C. Castillo, “Relating Web Characteristics with Link Based Web Page

Raking,” Proc. of SPIRE 2001, IEEE CS Press, Laguna San Rafael, Chile, pp. 21-32, Nov. 2001.

C. Frankel, M.J. Swain and V. Athitsos, WebSeer: An Image Search Engine for the World Wide

Web, University of Chicago Technical Report TR-96-14, July 31, 1996.

A. Jaimes, J. Ruiz-del-Solar, R. Verschae, D. Yaksic, R. Baeza-Yates, E. Davis, and C. Castillo,

On the Image Content of the Web in Chile, Proc. of the First Latin American Web Congress, IEEE

CS Press, 72 – 83, Santiago, Chile, Nov. 10 – 12, 2003.

A. Jaimes, “Conceptual Structures and Computational Methods for Indexing and Organization of

Visual Information," Ph.D. Thesis, Department of Electrical Engineering, Columbia University,

February 2003.

M. Niskanen, O. Silven, and H. Kauppinen, “Color an Texture based Wood Inspection with nonsupervised

Clustering,” Proc. of the 12th Scandinavian Conf. on Image Analysis - SCIA 2001, 336

- 342, Bergen, Norway, June 11-14, 2001.

Y. Rui, T.S. Huang, and S.-F. Chang, Image Retrieval: Current Directions, Promising Techniques,

and Open Issues, Journal of Visual Communication and Image Representation, No. 10:1-23, 1999.

J. Ruiz-del-Solar and R. Verschae, Skin Detection using Neighborhood Information, 6th Int. Conf.

on Face and Gesture Recognition – FG 2004, 463 – 468, Seoul, Korea, May 2004.

J. Russ. The Image Processing Handbook, 3rd Edition. CRC Press, Boca Raton, Florida, 1999.

N. Sebe, M. Lew, X. Zhou, T. Huang and E. Bakker, The State of the Art in Image and Video

Retrieval, Lecture Notes in Computer Science 2728 (Image and Video Retrieval 2003) 1 – 8, 2003.

A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content-Based Image

Retrieval at the End of the Early Years”, IEEE Trans. on Pattern Analysis and Machine

Intelligence, Vol. 22, No. 12, pp. 1349-1380, Dec. 2000.

J.R. Smith and S.-F. Chang, An Image and Video Search Engine for the World-Wide Web, Proc.

of SPIE Storage & Retrieval for Image and Video Databases V, Vol. 3022, pp. 84-95, San Jose,

CA, Feb. 1997.

TodoCL Search Engine (http://www.todocl.cl/), 2000-2004.

R. Verschae and J. Ruiz-del-Solar, A Hybrid Face Detector based on an Asymmetrical Adaboost

Cascade Detector and a Wavelet-Bayesian-Detector, Lecture Notes in Computer Science 2686,

Springer, 742-749, 2003.

P. Viola and M. Jones, Fast and Robust Classification using Asymmetric AdaBoost and a Detector

Cascade, Advances in Neural Information Processing System 14, MIT Press, Cambridge, MA,

I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java

Implementations, Morgan Kaufmann, 1999. Weka homepage:

http://www.cs.waikato.ac.nz/~ml/weka/

Downloads

Published

2004-10-22

How to Cite

JAIMES, A. ., SOLAR, J. R.-D., VERSCHAE, R. ., YAKSIC, D., YATES, R. B., CASTILLO, C. ., & DAVIS, E. . (2004). ON THE IMAGE CONTENT OF A WEB SEGMENT: CHILE AS A CASE STUDY. Journal of Web Engineering, 3(1-2), 153–168. Retrieved from https://journals.riverpublishers.com/index.php/JWE/article/view/4345

Issue

Section

Articles