ENHANCING KEYWORD SUGGESTION OF WEB SERACH BY LEVERAGING MICROBLOG DATA

Authors

  • LIN LI Hubei Collaborative & Innovative Center for Basic Educational Technology and School of Computer Science & Technology, Wuhan University of Technology Wuhan, 430070, China
  • LU QI School of Computer Science & Technology, Wuhan University of Technology and Hubei Key Laboratory of Transportation Internet of Things, Wuhan University of Technology Wuhan, 430070, China
  • FANG DENG College of Computer, Hubei University of Education Wuhan, 430205, China
  • SHENGWU XIONG School of Computer Science & Technology, Wuhan University of Technology Wuhan, 430070, China
  • JINGLING YUAN School of Computer Science & Technology, Wuhan University of Technology Wuhan, 430070, China

Keywords:

Search query, microblog posts, suggestion, pesudo relevance

Abstract

Query suggestion of Web search is an eective approach to help users quickly express their information need and accurately get the information they need. Most of popular web-search engines provide possible query suggestions based on their query log data, which is a kind of implicit relevance based approach. However, it is dicult to give suggestions to search queries that have no or few historical evidences in query logs. To solve this problem, traditional pseudo relevance based approaches directly extract addi- tional keywords from the top-listed search results of a given search query as suggestions. However, for hot topic or event related search queries, users more like to browse the latest and newly appeared contents. In this paper, we follow the direction of pseudo relevance based suggestion approaches by mining microblog data that is inherent in fast information propagation and dissemination. Our graph based rank aggregation ap- proach combines a frequency based ranking with considering words themself and a LDA (Latent Dirichlet Allocation) based ranking by mining hidden topics behinds words. A dataset is crawled from the posts of fourteen micro-topics of Sina microblog platform. The experimental results clearly demonstrate our proposed approach is more eective than traditional pseudo relevance based methods. Moreover, the suggested keywords extracted from the posts published by authenticated users are more eective than two traditional pseudo relevance based approaches, i.e., the posts submitted by all users and the top returned posts returned by Sina search engine. In addition, applying LDA on microbog posts alone is far from satisfactory, but the combination of the frequency based ranking and the LDA based ranking show much better performance.

 

Downloads

Download data is not yet available.

References

R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query recommendation using query logs in search

engines. In Proceedings of the 2004 International Conference on Current Trends in Database

Technology, EDBT'04, pages 588{596, Berlin, Heidelberg, 2004. Springer-Verlag.

R. A. Baeza-Yates, C. A. Hurtado, and M. Mendoza. Improving search engines by query clustering.

M. Barouni-Ebrahimi and A. A. Ghorbani. A novel approach for frequent phrase mining in web

search engine query streams. In CNSR, pages 125{132. IEEE Computer Society, 2007.

S. Bhatia, D. Majumdar, and P. Mitra. Query suggestions in the absence of query logs. In

Proceedings of the 34th International ACM SIGIR Conference on Research and Development in

Information Retrieval, SIGIR '11, pages 795{804, New York, NY, USA, 2011. ACM.

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning

Research, 3:993{1022, 2003.

P. Boldi, F. Bonchi, C. Castillo, D. Donato, and S. Vigna. Query suggestions using query-

ow

graphs. In Proceedings of the 2009 Workshop on Web Search Click Data, WSCD '09, pages 56{63,

New York, NY, USA, 2009. ACM.

J. Borda. Mmoire sur les lections au scrutin. Comptes rendus de lAcadmie des sciences, 44, 1781.

A. Broder, P. Ciccolo, E. Gabrilovich, V. Josifovski, D. Metzler, L. Riedel, and J. Yuan. Online

expansion of rare queries for sponsored search. In Proceedings of the 18th International Conference

on World Wide Web, WWW '09, pages 511{520, New York, NY, USA, 2009. ACM.

H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, and H. Li. Context-aware query suggestion by

mining click-through and session data. In Proceedings of the 14th ACM SIGKDD International

Conference on Knowledge Discovery and Data Mining, KDD '08, pages 875{883, New York, NY,

USA, 2008. ACM.

X. Chen, L. Li, G. Xu, Z. Yang, and M. Kitsuregawa. Recommending related microblogs: A

comparison between topic and wordnet based approaches. In Proceedings of the Twenty-Sixth

AAAI Conference on Arti cial Intelligence, July 22-26, 2012, Toronto, Ontario, Canada, AAAI,

pages 2417{2418. AAAI Press, 2012.

S. Cucerzan and R. W. White. Query suggestion based on user landing pages. In Proceedings of the

th Annual International ACM SIGIR Conference on Research and Development in Information

Retrieval, SIGIR '07, pages 875{876, New York, NY, USA, 2007. ACM.

H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma. Probabilistic query expansion using query logs. In

Proceedings of the Eleventh International World Wide Web Conference, WWW2002, Honolulu,

Hawaii, USA, pages 325{332. ACM, 2002.

P. Diaconis and R. L. Graham. Spearman's footrule as a measure of disarray. Journal of the Royal

Statistical Society. Series B (Methodological), 39(2):262{268, 1977.

C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In

Proc. of the 10th Int'l Conf. on World Wide Web (WWW'01), pages 613{622, Hong Kong, China,

A. Feuer, S. Savev, and J. A. Aslam. Evaluation of phrasal query suggestions. In Proceedings

of the Sixteenth ACM Conference on Conference on Information and Knowledge Management,

CIKM '07, pages 841{848, New York, NY, USA, 2007. ACM.

L. Fitzpatrick and M. Dent. Automatic feedback using past queries: Social searching? In Proceed-

ings of the 20th Annual International ACM SIGIR Conference on Research and Development in

Information Retrieval, SIGIR '97, pages 306{313, New York, NY, USA, 1997. ACM.

W. Gao, C. Niu, J.-Y. Nie, M. Zhou, J. Hu, K.-F. Wong, and H.-W. Hon. Cross-lingual query

suggestion using query logs of di erent languages. In Proceedings of the 30th Annual International

ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '07, pages

{470, New York, NY, USA, 2007. ACM.

N. S. Glance. Community search assistant. In In Arti cial Intelligence for Web Search, pages

{96. AAAI Press, 2000.

N. S. Glance. Community search assistant. In Proceedings of the 6th International Conference on

Intelligent User Interfaces, IUI '01, pages 91{96, New York, NY, USA, 2001. ACM.

J. Guo, X. Cheng, G. Xu, and H. Shen. A structured approach to query recommendation with

social annotation data. In Proceedings of the 19th ACM Conference on Information and Knowledge

Management, CIKM, pages 619{628. ACM, 2010.

R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In Proceedings

of the 15th International Conference on World Wide Web, WWW '06, pages 387{396, New York,

NY, USA, 2006. ACM.

D. Kelly, K. Gyllstrom, and E. W. Bailey. A comparison of query and term suggestion features

for interactive searching. In Proceedings of the 32Nd International ACM SIGIR Conference on

Research and Development in Information Retrieval, SIGIR '09, pages 371{378, New York, NY,

USA, 2009. ACM.

C. Klamler. The dodgson ranking and the borda count: a binary comparison. Mathematical Social

Sciences, 48(1):103{108, 2004.

H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In

Proceedings of the 19th International Conference on World Wide Web, WWW '10, pages 591{600,

New York, NY, USA, 2010. ACM.

L. Li, G. Xu, Y. Zhang, and M. Kitsuregawa. Random walk based rank aggregation to improving

web search. Knowl.-Based Syst., 24(7):943{951, 2011.

L. Li, Z. Yang, L. Liu, and M. Kitsuregawa. Query-url bipartite based approach to personalized

query recommendation. In Proceedings of the 23rd National Conference on Arti cial Intelligence

- Volume 2, AAAI'08, pages 1189{1194. AAAI Press, 2008.

H. Ma, H. Yang, I. King, and M. R. Lyu. Learning latent semantic relations from clickthrough data

for query suggestion. In Proceedings of the 17th ACM Conference on Information and Knowledge

Management, CIKM '08, pages 709{718, New York, NY, USA, 2008. ACM.

C. D. Manning, P. Raghavan, and H. Schutze. Introduction to information retrieval, 2008.

Q. Mei, D. Zhou, and K. Church. Query suggestion using hitting time. In Proceedings of the 17th

ACM Conference on Information and Knowledge Management, CIKM '08, pages 469{478, New

York, NY, USA, 2008. ACM.

M. E. J. Newman and J. Park. Why social networks are di erent from other types of networks.

Phys. Rev. E, 68:036122, Sept. 2003.

V. V. Raghavan and H. Sever. On the reuse of past optimal queries. In Proc. of the 18th Annual

International ACM SIGIR Conference on Research and Development in Information Retrieval

(SIGIR'95), pages 344{350, Seattle, Washington, USA, 1995.

F. Silvestri. Mining query logs: Turning search usage data into knowledge. Foundations and

Trends in Information Retrieval, 4(1-2):1{174, 2010.

Y. Song and L. wei He. Optimal rare query suggestion with implicit user feedback. In M. Rappa,

P. Jones, J. Freire, and S. Chakrabarti, editors, WWW, pages 901{910. ACM, 2010.

J.-R. Wen, J.-Y. Nie, and H. Zhang. Query clustering using user logs. ACM Trans. Inf. Syst.,

(1):59{81, 2002.

J.-M. Yang, R. Cai, F. Jing, S. Wang, L. Zhang, and W.-Y. Ma. Search-based query suggestion.

In Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM

'08, pages 1439{1440, New York, NY, USA, 2008. ACM.

H. P. Young. Condorcet's theory of voting. American Political Science Review, 82(4):1231{1244,

L. Zhiyuan, C. Xinxiong, and S. Maosong. Mining the interests of chinese microbloggers via

keyword extraction. Foundations and Trends in Information Retrieval, 6(1):76{87, 2012.

Downloads

Published

2016-02-29

Issue

Section

Articles