ENHANCING KEYWORD SUGGESTION OF WEB SERACH BY LEVERAGING MICROBLOG DATA
Keywords:
Search query, microblog posts, suggestion, pesudo relevanceAbstract
Query suggestion of Web search is an eective approach to help users quickly express their information need and accurately get the information they need. Most of popular web-search engines provide possible query suggestions based on their query log data, which is a kind of implicit relevance based approach. However, it is dicult to give suggestions to search queries that have no or few historical evidences in query logs. To solve this problem, traditional pseudo relevance based approaches directly extract addi- tional keywords from the top-listed search results of a given search query as suggestions. However, for hot topic or event related search queries, users more like to browse the latest and newly appeared contents. In this paper, we follow the direction of pseudo relevance based suggestion approaches by mining microblog data that is inherent in fast information propagation and dissemination. Our graph based rank aggregation ap- proach combines a frequency based ranking with considering words themself and a LDA (Latent Dirichlet Allocation) based ranking by mining hidden topics behinds words. A dataset is crawled from the posts of fourteen micro-topics of Sina microblog platform. The experimental results clearly demonstrate our proposed approach is more eective than traditional pseudo relevance based methods. Moreover, the suggested keywords extracted from the posts published by authenticated users are more eective than two traditional pseudo relevance based approaches, i.e., the posts submitted by all users and the top returned posts returned by Sina search engine. In addition, applying LDA on microbog posts alone is far from satisfactory, but the combination of the frequency based ranking and the LDA based ranking show much better performance.
Downloads
References
R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query recommendation using query logs in search
engines. In Proceedings of the 2004 International Conference on Current Trends in Database
Technology, EDBT'04, pages 588{596, Berlin, Heidelberg, 2004. Springer-Verlag.
R. A. Baeza-Yates, C. A. Hurtado, and M. Mendoza. Improving search engines by query clustering.
M. Barouni-Ebrahimi and A. A. Ghorbani. A novel approach for frequent phrase mining in web
search engine query streams. In CNSR, pages 125{132. IEEE Computer Society, 2007.
S. Bhatia, D. Majumdar, and P. Mitra. Query suggestions in the absence of query logs. In
Proceedings of the 34th International ACM SIGIR Conference on Research and Development in
Information Retrieval, SIGIR '11, pages 795{804, New York, NY, USA, 2011. ACM.
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning
Research, 3:993{1022, 2003.
P. Boldi, F. Bonchi, C. Castillo, D. Donato, and S. Vigna. Query suggestions using query-
ow
graphs. In Proceedings of the 2009 Workshop on Web Search Click Data, WSCD '09, pages 56{63,
New York, NY, USA, 2009. ACM.
J. Borda. Mmoire sur les lections au scrutin. Comptes rendus de lAcadmie des sciences, 44, 1781.
A. Broder, P. Ciccolo, E. Gabrilovich, V. Josifovski, D. Metzler, L. Riedel, and J. Yuan. Online
expansion of rare queries for sponsored search. In Proceedings of the 18th International Conference
on World Wide Web, WWW '09, pages 511{520, New York, NY, USA, 2009. ACM.
H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, and H. Li. Context-aware query suggestion by
mining click-through and session data. In Proceedings of the 14th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, KDD '08, pages 875{883, New York, NY,
USA, 2008. ACM.
X. Chen, L. Li, G. Xu, Z. Yang, and M. Kitsuregawa. Recommending related microblogs: A
comparison between topic and wordnet based approaches. In Proceedings of the Twenty-Sixth
AAAI Conference on Arti cial Intelligence, July 22-26, 2012, Toronto, Ontario, Canada, AAAI,
pages 2417{2418. AAAI Press, 2012.
S. Cucerzan and R. W. White. Query suggestion based on user landing pages. In Proceedings of the
th Annual International ACM SIGIR Conference on Research and Development in Information
Retrieval, SIGIR '07, pages 875{876, New York, NY, USA, 2007. ACM.
H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma. Probabilistic query expansion using query logs. In
Proceedings of the Eleventh International World Wide Web Conference, WWW2002, Honolulu,
Hawaii, USA, pages 325{332. ACM, 2002.
P. Diaconis and R. L. Graham. Spearman's footrule as a measure of disarray. Journal of the Royal
Statistical Society. Series B (Methodological), 39(2):262{268, 1977.
C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In
Proc. of the 10th Int'l Conf. on World Wide Web (WWW'01), pages 613{622, Hong Kong, China,
A. Feuer, S. Savev, and J. A. Aslam. Evaluation of phrasal query suggestions. In Proceedings
of the Sixteenth ACM Conference on Conference on Information and Knowledge Management,
CIKM '07, pages 841{848, New York, NY, USA, 2007. ACM.
L. Fitzpatrick and M. Dent. Automatic feedback using past queries: Social searching? In Proceed-
ings of the 20th Annual International ACM SIGIR Conference on Research and Development in
Information Retrieval, SIGIR '97, pages 306{313, New York, NY, USA, 1997. ACM.
W. Gao, C. Niu, J.-Y. Nie, M. Zhou, J. Hu, K.-F. Wong, and H.-W. Hon. Cross-lingual query
suggestion using query logs of di erent languages. In Proceedings of the 30th Annual International
ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '07, pages
{470, New York, NY, USA, 2007. ACM.
N. S. Glance. Community search assistant. In In Arti cial Intelligence for Web Search, pages
{96. AAAI Press, 2000.
N. S. Glance. Community search assistant. In Proceedings of the 6th International Conference on
Intelligent User Interfaces, IUI '01, pages 91{96, New York, NY, USA, 2001. ACM.
J. Guo, X. Cheng, G. Xu, and H. Shen. A structured approach to query recommendation with
social annotation data. In Proceedings of the 19th ACM Conference on Information and Knowledge
Management, CIKM, pages 619{628. ACM, 2010.
R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In Proceedings
of the 15th International Conference on World Wide Web, WWW '06, pages 387{396, New York,
NY, USA, 2006. ACM.
D. Kelly, K. Gyllstrom, and E. W. Bailey. A comparison of query and term suggestion features
for interactive searching. In Proceedings of the 32Nd International ACM SIGIR Conference on
Research and Development in Information Retrieval, SIGIR '09, pages 371{378, New York, NY,
USA, 2009. ACM.
C. Klamler. The dodgson ranking and the borda count: a binary comparison. Mathematical Social
Sciences, 48(1):103{108, 2004.
H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In
Proceedings of the 19th International Conference on World Wide Web, WWW '10, pages 591{600,
New York, NY, USA, 2010. ACM.
L. Li, G. Xu, Y. Zhang, and M. Kitsuregawa. Random walk based rank aggregation to improving
web search. Knowl.-Based Syst., 24(7):943{951, 2011.
L. Li, Z. Yang, L. Liu, and M. Kitsuregawa. Query-url bipartite based approach to personalized
query recommendation. In Proceedings of the 23rd National Conference on Arti cial Intelligence
- Volume 2, AAAI'08, pages 1189{1194. AAAI Press, 2008.
H. Ma, H. Yang, I. King, and M. R. Lyu. Learning latent semantic relations from clickthrough data
for query suggestion. In Proceedings of the 17th ACM Conference on Information and Knowledge
Management, CIKM '08, pages 709{718, New York, NY, USA, 2008. ACM.
C. D. Manning, P. Raghavan, and H. Schutze. Introduction to information retrieval, 2008.
Q. Mei, D. Zhou, and K. Church. Query suggestion using hitting time. In Proceedings of the 17th
ACM Conference on Information and Knowledge Management, CIKM '08, pages 469{478, New
York, NY, USA, 2008. ACM.
M. E. J. Newman and J. Park. Why social networks are di erent from other types of networks.
Phys. Rev. E, 68:036122, Sept. 2003.
V. V. Raghavan and H. Sever. On the reuse of past optimal queries. In Proc. of the 18th Annual
International ACM SIGIR Conference on Research and Development in Information Retrieval
(SIGIR'95), pages 344{350, Seattle, Washington, USA, 1995.
F. Silvestri. Mining query logs: Turning search usage data into knowledge. Foundations and
Trends in Information Retrieval, 4(1-2):1{174, 2010.
Y. Song and L. wei He. Optimal rare query suggestion with implicit user feedback. In M. Rappa,
P. Jones, J. Freire, and S. Chakrabarti, editors, WWW, pages 901{910. ACM, 2010.
J.-R. Wen, J.-Y. Nie, and H. Zhang. Query clustering using user logs. ACM Trans. Inf. Syst.,
(1):59{81, 2002.
J.-M. Yang, R. Cai, F. Jing, S. Wang, L. Zhang, and W.-Y. Ma. Search-based query suggestion.
In Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM
'08, pages 1439{1440, New York, NY, USA, 2008. ACM.
H. P. Young. Condorcet's theory of voting. American Political Science Review, 82(4):1231{1244,
L. Zhiyuan, C. Xinxiong, and S. Maosong. Mining the interests of chinese microbloggers via
keyword extraction. Foundations and Trends in Information Retrieval, 6(1):76{87, 2012.