SLASH-BASED RELEVANCE PROPAGATION MODEL FOR TOPIC DISTILLATION

Authors

  • MOHAMMAD AMIN GOLSHANI Department of Electrical and Computer Engineering, Yazd University, Yazd, Iran
  • ALI MOHAMMAD ZAREHBIDOKI Department of Electrical and Computer Engineering, Yazd University, Yazd, Iran
  • VALI DERHAMI Department of Electrical and Computer Engineering, Yazd University, Yazd, Iran

Keywords:

Web information retrieval, ranking, search engine, propagation methods, number of slashes in the URL

Abstract

An efficient and effective ranking mechanism in the search engines remains as a challenging problem. In recent years, a few relevance propagation models like Hyperlink-based score propagation, Hyperlinkbased term propagation, and Popularity-based propagation models have been proposed. In this paper, we will give a comprehensive study of the relevance propagation technologies for Web information retrieval and conduct both theoretical and experimental evaluations over these models to know which model is more effective and efficient. We also propose a new relevance propagation model based on content, link structure (web graph), and number of slashes in the URL. It propagates content and the number of slashes as scores through the link structure. The goal is to find more relevant web pages to the user query. To compare relevance propagation models, Letor 3.0- a standard web test collection- was used in the experiments. We have concluded that using number of slashes in the propagation process provides improvement in Web information retrieval accuracy.

 

Downloads

Download data is not yet available.

References

Alam, M.H., J. Ha, and S. Lee, Novel approaches to crawling important pages early. Knowledge

and Information Systems, December 2012. 33(3): 707-734.

Baeza-Yates, R., Castillo, C., Crawling the Infinite Web, Journal of Web Engineering, 6(1) , 2007,

-72.

Baeza-Yates, R. & Ribeiro-Neto, B. Modern Information Retrieval. ACM Press/Addison Wesley,

Brin, S., Page, L., The Anatomy of a Large Scale Hypertextual Web Search Engine, Proc. 7th

WWW, 1998.

Chen, Y.-L. and X.-H. Chen, An evolutionary PageRank approach for journal ranking with expert

judgements Journal of Information Science, June 2011;. 37(3), 254-272.

Golshani, M.A, ZarehBidoki, A.M, IECA: Intelligent Effective Crawling Algorithm for Web

pages, International Journal of Information & Communication Technology Research (IJICTR).

Gong, Z., L.H. U, and C.W. Cheang, Web image indexing by using associated texts. Knowledge

and Information Systems, August 2006. 10(2), 243-264.

Haveliwala, T., Topic-Sensitive Pagerank, Proc. of the 11th WWW, 2002.

Huberman, B.A., et al., Strong Regularities in World Wide Web Surfing. Science, April 1998.

(5360), 95-97.

Jarvelin, K. & Kekalainen, J. Comulated Gainbased Evaluation of IR Techniques. ACM

Transactions on Information Systems, 2002, 20(04), 422–446.

Jiang, L., C. Li, and Z. Cai, Learning decision tree for ranking. Knowledge and Information

Systems, July 2009. 20(1), 123-135.

Kwon, S., Y.-G. Kim, and S. Cha, Web robot detection based on pattern-matching technique.

Journal of Information Science, February 27, 2012. 38(2), 118-126.

Lewandowski, D., A three-year study on the freshness of web search engine databases Journal of

Information Science, December 2008. 34(6), 817-831

Mousakazemi, E., Saram, M.A., ZarehBidoki, A.M, Popularity-based relevance propagation,

Journal of Web Engineering, 2012, 1(4), 350-364.

Mukherjea, S., Discovering and analyzing World Wide Web collections. Knowledge and

Information Systems, March 2004. 6(2), 230-241.

Najork,. M., Wiener, J., Breadth-First Search Crawling Yields High-Quality Pages, in 10th

International conference World Wide Web, 2001.

O. Kurland and L. Lee. Pagerank without hyperlinks: structural re-ranking using links induced by

language models. In Proceedings of ACM SIGIR, 2005, 306–313.

Page, L., Brin, L., Motwani, R., Winograd, T., The PageRank Citation Ranking: Bringing Order to

the Web, 1998, Technical report, Stanford University, Stanford, CA.

Qin, T., Liu, T. Y., Zhang, X. D., Chen, Z., & Ma, W. Y. A study of relevance propagation for

web search. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research

and Development in Information Retrieval, 2005, 408–415.

Qin, T., Liu, T., Xu, J. & Li, H. Letor: A benchmark collection for research on learning to rank for

information retrieval. Information Retrieval Journal, 2010, 346-374.

Rosset, S., C. Perlich, and B. Zadrozny, Ranking-based evaluation of regression models.

Knowledge and Information Systems, August 2007. 12(3), 331-353.

Rosa, K.D., V. Metsis, and V. Athitsos, Boosted ranking models: a unifying framework for

ranking predictions. Knowledge and Information Systems, March 2012. 30(3), 543-568.

Robertson, S., Jones, K., Relevance Weighting of Search Terms, Journal of the American

Society of Information Science, 129-146.

Robertson, S., Overview of the Okapi Projects, Journal of Documentation, Vol. 53, No. 1, 1997,

-7.

Salton, G., Buckley, C., Term weighting approaches in automatic text retrieval, Information

Processing and Management, 1988, 24(5), 513-523.

Shakery, A. & Zhai, C. X. Relevance Propagation for Topic Distillation UIUC TREC 2003 Web

Track Experiments. In Proceedings of the TREC Conference, 2003.

Shakery, A. & Zhai, C. X. A probabilistic relevance propagation model for hypertext retrieval. In

Proceedings of the 15th ACM International Conference on Information and Knowledge

Management (CIKM), 2006, 550-558.

Shchekotykhin, K., D. Jannach, and G. Friedrich, xCrawl: a high-recall crawling method for Web

mining. Knowledge and Information Systems, November 2010. 25(2), 303-326.

Song, R., Wen, J., Shi, S., Xin, G., Liu, T., Qin, T., Zheng, X., Zhang, J., Xue, G., Ma, W.,

Microsoft Research Asia at Web Track and Terabyte Track of TREC 2004, Proc. the 13th TREC,

S. Pandey and C. Olston, “User-centric Web crawling,” in 14th international conference on World

Wide Web, 2005.

Wang, B., et al., Query-dependent cross-domain ranking in heterogeneous network. Knowledge

and Information Systems, January 2012.

Xia, F., et al., Ranking with decision tree. Knowledge and Information Systems, December 2008.

(3), 381-395.

ZarehBidoki, A., Yazdani, N., DistanceRank: An intelligent ranking algorithm for web pages,

Information Processing and Management, 2008, 44(2).

Downloads

Published

2013-01-28

How to Cite

GOLSHANI, M. A. ., ZAREHBIDOKI, A. M., & DERHAMI, V. . (2013). SLASH-BASED RELEVANCE PROPAGATION MODEL FOR TOPIC DISTILLATION. Journal of Web Engineering, 12(3-4), 265–290. Retrieved from https://journals.riverpublishers.com/index.php/JWE/article/view/4161

Issue

Section

Articles