SLASH-BASED RELEVANCE PROPAGATION MODEL FOR TOPIC DISTILLATION
Keywords:
Web information retrieval, ranking, search engine, propagation methods, number of slashes in the URLAbstract
An efficient and effective ranking mechanism in the search engines remains as a challenging problem. In recent years, a few relevance propagation models like Hyperlink-based score propagation, Hyperlinkbased term propagation, and Popularity-based propagation models have been proposed. In this paper, we will give a comprehensive study of the relevance propagation technologies for Web information retrieval and conduct both theoretical and experimental evaluations over these models to know which model is more effective and efficient. We also propose a new relevance propagation model based on content, link structure (web graph), and number of slashes in the URL. It propagates content and the number of slashes as scores through the link structure. The goal is to find more relevant web pages to the user query. To compare relevance propagation models, Letor 3.0- a standard web test collection- was used in the experiments. We have concluded that using number of slashes in the propagation process provides improvement in Web information retrieval accuracy.
Downloads
References
Alam, M.H., J. Ha, and S. Lee, Novel approaches to crawling important pages early. Knowledge
and Information Systems, December 2012. 33(3): 707-734.
Baeza-Yates, R., Castillo, C., Crawling the Infinite Web, Journal of Web Engineering, 6(1) , 2007,
-72.
Baeza-Yates, R. & Ribeiro-Neto, B. Modern Information Retrieval. ACM Press/Addison Wesley,
Brin, S., Page, L., The Anatomy of a Large Scale Hypertextual Web Search Engine, Proc. 7th
WWW, 1998.
Chen, Y.-L. and X.-H. Chen, An evolutionary PageRank approach for journal ranking with expert
judgements Journal of Information Science, June 2011;. 37(3), 254-272.
Golshani, M.A, ZarehBidoki, A.M, IECA: Intelligent Effective Crawling Algorithm for Web
pages, International Journal of Information & Communication Technology Research (IJICTR).
Gong, Z., L.H. U, and C.W. Cheang, Web image indexing by using associated texts. Knowledge
and Information Systems, August 2006. 10(2), 243-264.
Haveliwala, T., Topic-Sensitive Pagerank, Proc. of the 11th WWW, 2002.
Huberman, B.A., et al., Strong Regularities in World Wide Web Surfing. Science, April 1998.
(5360), 95-97.
Jarvelin, K. & Kekalainen, J. Comulated Gainbased Evaluation of IR Techniques. ACM
Transactions on Information Systems, 2002, 20(04), 422–446.
Jiang, L., C. Li, and Z. Cai, Learning decision tree for ranking. Knowledge and Information
Systems, July 2009. 20(1), 123-135.
Kwon, S., Y.-G. Kim, and S. Cha, Web robot detection based on pattern-matching technique.
Journal of Information Science, February 27, 2012. 38(2), 118-126.
Lewandowski, D., A three-year study on the freshness of web search engine databases Journal of
Information Science, December 2008. 34(6), 817-831
Mousakazemi, E., Saram, M.A., ZarehBidoki, A.M, Popularity-based relevance propagation,
Journal of Web Engineering, 2012, 1(4), 350-364.
Mukherjea, S., Discovering and analyzing World Wide Web collections. Knowledge and
Information Systems, March 2004. 6(2), 230-241.
Najork,. M., Wiener, J., Breadth-First Search Crawling Yields High-Quality Pages, in 10th
International conference World Wide Web, 2001.
O. Kurland and L. Lee. Pagerank without hyperlinks: structural re-ranking using links induced by
language models. In Proceedings of ACM SIGIR, 2005, 306–313.
Page, L., Brin, L., Motwani, R., Winograd, T., The PageRank Citation Ranking: Bringing Order to
the Web, 1998, Technical report, Stanford University, Stanford, CA.
Qin, T., Liu, T. Y., Zhang, X. D., Chen, Z., & Ma, W. Y. A study of relevance propagation for
web search. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research
and Development in Information Retrieval, 2005, 408–415.
Qin, T., Liu, T., Xu, J. & Li, H. Letor: A benchmark collection for research on learning to rank for
information retrieval. Information Retrieval Journal, 2010, 346-374.
Rosset, S., C. Perlich, and B. Zadrozny, Ranking-based evaluation of regression models.
Knowledge and Information Systems, August 2007. 12(3), 331-353.
Rosa, K.D., V. Metsis, and V. Athitsos, Boosted ranking models: a unifying framework for
ranking predictions. Knowledge and Information Systems, March 2012. 30(3), 543-568.
Robertson, S., Jones, K., Relevance Weighting of Search Terms, Journal of the American
Society of Information Science, 129-146.
Robertson, S., Overview of the Okapi Projects, Journal of Documentation, Vol. 53, No. 1, 1997,
-7.
Salton, G., Buckley, C., Term weighting approaches in automatic text retrieval, Information
Processing and Management, 1988, 24(5), 513-523.
Shakery, A. & Zhai, C. X. Relevance Propagation for Topic Distillation UIUC TREC 2003 Web
Track Experiments. In Proceedings of the TREC Conference, 2003.
Shakery, A. & Zhai, C. X. A probabilistic relevance propagation model for hypertext retrieval. In
Proceedings of the 15th ACM International Conference on Information and Knowledge
Management (CIKM), 2006, 550-558.
Shchekotykhin, K., D. Jannach, and G. Friedrich, xCrawl: a high-recall crawling method for Web
mining. Knowledge and Information Systems, November 2010. 25(2), 303-326.
Song, R., Wen, J., Shi, S., Xin, G., Liu, T., Qin, T., Zheng, X., Zhang, J., Xue, G., Ma, W.,
Microsoft Research Asia at Web Track and Terabyte Track of TREC 2004, Proc. the 13th TREC,
S. Pandey and C. Olston, “User-centric Web crawling,” in 14th international conference on World
Wide Web, 2005.
Wang, B., et al., Query-dependent cross-domain ranking in heterogeneous network. Knowledge
and Information Systems, January 2012.
Xia, F., et al., Ranking with decision tree. Knowledge and Information Systems, December 2008.
(3), 381-395.
ZarehBidoki, A., Yazdani, N., DistanceRank: An intelligent ranking algorithm for web pages,
Information Processing and Management, 2008, 44(2).