AUTOMATIC MAINTENANCE OF WEB DIRECTORIES BY MINING WEB BROWSING DATA

CARLOS  HURTADO; MARCELO  MENDOZA

Authors

CARLOS HURTADO Faculty of Engineering and Science, Universidad Adolfo Ib´a˜nez Diagonal Las Torres 2640, Santiago, Chile
MARCELO MENDOZA Computer Science Department, Universidad T´ecnica Federico Santa Mar´ıa Vicu˜na Mackenna 3939, Santiago, Chile

Keywords:

Web directories, Web Mining, Query Logs

Abstract

Web directories allow Web users to browse a hierarchy of categories, under which di- fferent types of resources are classified. We study the problem of maintaining a Web directory, that is, the problem of continually discovering and ranking resources that are relevant to the categories of the directory. We propose an unsupervised computational method that conducts the maintenance of the directory by analyses of user browsing data. The method is based on the extraction and classification of user sessions (se- quences of resources selected by users) into the categories of the directory. In addition, we show that the directory maintenance method can be slightly modified to find queries that are useful to find relevant resources allowing users to switch from directory browsing to query formulation. Experimental results allow for affirmation that the proposed me- thods are effective, that they attain identification of new pages in each category and also recommend related queries with high precision, without needing labeled data to conduct traditional web page and query classification tasks.

Downloads

Download data is not yet available.

References

X. Qi and B. Davison (2009), Web page classification: Features and algorithms, ACM Computing

Surveys,41(2):1-31.

Sebastiani, F., (2002). Machine learning in automated text categorization. ACM Computing

Surveys,34(1):1-47.

Yang, H.-C., Lee, C.-H., (2004). A text mining approach on automatic generation of web directories

and hierarchies. Expert Syst. Appl. 27 (4), 645663.

Stamou, S., Ntoulas, A., Krikos, V., Kokosis, P., Christodoulakis, D., (2006). Classifying web data

in directory structures. In: Zhou, X., Li, J., Shen, H. T., Kitsuregawa, M., Zhang, Y. (Eds.),

APWeb. Vol. 3841 of Lecture Notes in Computer Science. Springer, pp. 238-249.

Chung, W., Lai, G., Bonillas, A., Xi, W., Chen, H., (2008). Organizing domain-specific information

on the web: An experiment on the spanish business web directory. Int. J. Hum.-Comput. Stud. 66

(2), 5166.

Gerstel, O., Kutten, S., Laber, E., Matichin, R., Peleg, D., Pessoa, A., de Souza, C. (2007),

Reducing human interactions in Web directory searches. ACM Trans. Inf. Syst. 25 (4), 1-28.

Zaihrayeu, I., Sun, L., Giunchiglia, F., Pan, W., Ju, Q., Chi, M., Huang, X., (2007). From web direc-

tories to ontologies: Natural language processing challenges. In: et al., K. A. (Ed.), ISWC/ASWC.

Vol. 4825 of Lecture Notes in Computer Science. Springer, pp. 623636.

Chuang, S.-L., Chien, L.-F., (2003). Enriching web taxonomies through subject categorization of

query terms from search engine logs. Decision Support Systems 35 (1), 113127.

Adami, G., Avesani, P., Sona, D., (2003). Clustering documents in a web directory. In: Chiang, R.

H. L., Laender, A. H. F., Lim, E.-P. (Eds.), WIDM. ACM, pp. 6673.

Adami, G., Avesani, P., Sona, D., (2005). Clustering documents into a web directory for bootstrap-

ping a supervised classification. Data Knowl. Eng. 54 (3), 301325.

Zhang, D., Lee, W. S., (2004). Learning to integrate web taxonomies. J. Web Sem. 2 (2), 131151.

Rocchio, J., (1971). Relevance feedback in information retrieval. In: G. Salton (Ed.), The SMART

Retrieval System - Experiments in Automatic Document Processing. Prentice Hall Inc., Englewood

Clifs, NJ, USA.

SIGKDD, (2005). KDD CUP 2005 dataset. http://www.sigkdd.org/kdd2005/kddcup.html.

Baeza-Yates, R., Ribeiro-Neto, B., (1999). Modern Information Retrieval. Addison-Wesley, ACM

Press, New York.

AUTOMATIC MAINTENANCE OF WEB DIRECTORIES BY MINING WEB BROWSING DATA

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

IEEE Xplore

ImpactScore

specialissue

issn

cover

Make a Submission

subreq

indexed