TYPE-AHEAD EXPLORATORY SEARCH THROUGH TYPO AND WORD ORDER TOLERANT AUTOCOMPLETION

Authors

  • PAVLOS FAFALIOS Institute of Computer Science, Foundation for Research and Technology - Hellas (FORTH), and Computer Science Department, University of Crete, GREECE
  • YANNIS TZITZIKAS Institute of Computer Science, Foundation for Research and Technology - Hellas (FORTH), and Computer Science Department, University of Crete, GREECE

Keywords:

type-ahead search, instant search, exploratory search, autocompletion, query suggestions, caching

Abstract

There is an increasing interest on recommending to the user instantly (during typing characters) queries and query results. This is evidenced by the emergence of several systems that offer such functionalities, e.g. Google Instant Search for Web searching or Facebook Search for social searching. In this paper we consider showing more rich rec- ommendations that show several other kinds of supplementary information that provide the user with a better overview of the search space. This supplementary information can be the result of various tasks (e.g. textual clustering or entity mining of the top search results), may have very large size and may cost a lot to be derived. The instant presen- tation of these recommendations (as the user types a query letter-by-letter) helps the user (a) to quickly discover what is popular among other users, (b) to decide fast which (of the suggested) query completions to use, and (c) to decide what hits of the returned answer to inspect. In this paper we focus on making this feasible (scalable) and flexible. Regarding scalability we elaborate on an approach based on precomputed information and we comparatively evaluate various trie-based index structures for making real-time interaction feasible, even if the size of the available memory space is limited. Specifically, we show how with modest hardware (like this of a mobile device) one can provide instant access to large amounts of data. Moreover, we propose and experimentally evaluate an incremental procedure for updating the index. For improving the throughput that can be served we analyze and experimentally evaluate various policies for caching subtries. With regard to flexibility, in order to reduce user’s effort and to increase the exploitation of the precomputed information, we elaborate on how the recommendations can tolerate different word orders and spelling errors, assuming the proposed trie-based index struc- tures. The experimental results revealed that such functionality significantly increases the number of recommendations especially for queries that contain several words. Fi- nally, we propose an algorithm for computing the top-K suggestions that exploits the ranking information in order to reduce the trie traversals. An experimental evaluation proves that the proposed algorithm highly improves the retrieval time.

 

Downloads

Download data is not yet available.

References

F. Silvestri, “Mining query logs: Turning search usage data into knowledge,” Foundations and

Trends in Information Retrieval, vol. 4, no. 12, pp. 1–174, 2010.

P. Fafalios and Y. Tzitzikas, “Exploiting available memory and disk for scalable instant overview

search,” in Web Information System Engineering–WISE 2011, pp. 101–115, Springer, 2011.

S. Kopidaki, P. Papadakos, and Y. Tzitzikas, “STC+ and NM-STC: Two novel online results clustering

methods for web searching,” in WISE ’09: Proceedings of the 10th International Conference

on Web Information Systems Engineering, October 2009.

P. Fafalios, I. Kitsos, Y. Marketakis, C. Baldassarre, M. Salampasis, and Y. Tzitzikas, “Web

searching with entity mining at query time,” in Multidisciplinary Information Retrieval, pp. 73–

, Springer, 2012.

P. Fafalios, I. Kitsos, and Y. Tzitzikas, “Scalable, flexible and generic instant overview search,” in

Proceedings of the 21st International Conference on World Wide Web, pp. 333–336, ACM, 2012.

M. K¨aki and A. Aula, “Findex: improving search result use through automatic filtering categories,”

Interacting with Computers, vol. 17, no. 2, pp. 187–206, 2005.

M. K¨aki, “Findex: search result categories help users when document ranking fails,” in Proceedings

of the SIGCHI conference on Human factors in computing systems, pp. 131–140, ACM, 2005.

B. Kules, R. Capra, M. Banta, and T. Sierra, “What do exploratory searchers look at in a faceted

search interface?,” in Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries,

pp. 313–322, ACM, 2009.

M. Wilson et al., “A longitudinal study of exploratory and keyword search,” in Proceedings of the

th ACM/IEEE-CS joint conference on Digital libraries, pp. 52–56, ACM, 2008.

S. Teddy, F. Yap, C. Quek, and E. Lai, “A neurocognitive approach to decision making for the

reconstruction of the metabolic insulin profile of a healthy person,” Handbook on Decision Making,

pp. 497–532, 2010.

A. Spink, B. Jansen, D. Wolfram, and T. Saracevic, “From e-sex to e-commerce: Web search

changes,” Computer, vol. 35, no. 3, pp. 107–109, 2002.

L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank citation ranking: bringing order

to the web.,” 1999.

G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Information

processing & management, vol. 24, no. 5, pp. 513–523, 1988.

E. Segev, Google and the Digital Divide: The Biases of Online Knowledge. Chandos Publishing

(Oxford), 2009.

L. Vaughan and M. Thelwall, “Search engine coverage bias: evidence and possible causes,” Infor-

mation Processing & Management, vol. 40, no. 4, pp. 693–707, 2004.

D. Bridge and F. Ricci, “Supporting product selection with query editing recommendations,” in

Proceedings of the 2007 ACM conference on Recommender systems, pp. 65–72, ACM, 2007.

S. Ji, G. Li, C. Li, and J. Feng, “Efficient interactive fuzzy keyword search,” in Proceedings of

the 18th international conference on World wide web, WWW ’09, (New York, USA), pp. 371–380,

ACM, 2009.

G. Li, S. Ji, C. Li, and J. Feng, “Efficient type-ahead search on relational data: a tastier approach,”

in Proceedings of the 35th SIGMOD international conference on Management of data, SIGMOD

’09, (New York, NY, USA), pp. 695–706, ACM, 2009.

H. Wu, G. Li, C. Li, and L. Zhou, “Seaform: Search-as-you-type in forms,” in Proceedings of the

th International Conference on Very Large Data Bases, VLDB ’10, 2010.

H. Bast and I. Weber, “Type less, find more: fast autocompletion search with a succinct index,” in

Proceedings of the 29th annual international ACM SIGIR conference on Research and development

in information retrieval, SIGIR ’06, (New York, NY, USA), pp. 364–371, ACM, 2006.

G. Li, J. Feng, and L. Zhou, “Interactive search in xml data,” in Proceedings of the 18th interna-

tional conference on World wide web, WWW ’09, (New York, NY, USA), pp. 1063–1064, ACM,

G. Li, J. Wang, C. Li, and J. Feng, “Supporting efficient top-k queries in type-ahead search,”

in Proceedings of the 35th international ACM SIGIR conference on Research and development in

information retrieval, pp. 355–364, ACM, 2012.

H. Bast and M. Celikik, “Efficient fuzzy search in large text collections,” ACM Transactions on

Information Systems (TOIS), vol. 31, no. 2, p. 10, 2013.

H. Bast and I. Weber, “The completesearch engine: Interactive, efficient, and towards ir& db

integration.,” in CIDR, vol. 7, pp. 88–95, 2007.

E. Markatos, “On caching search engine query results,” Computer Communications, vol. 24, no. 2,

pp. 137–143, 2001.

P. Saraiva, E. Silva de Moura, N. Ziviani, W. Meira, R. Fonseca, and B. Riberio-Neto, “Rankpreserving

two-level caching for scalable search engines,” in Proceedings of the 24th annual inter-

national ACM SIGIR conference on Research and development in information retrieval, pp. 51–58,

ACM, 2001.

X. Long and T. Suel, “Three-level caching for efficient query processing in large web search engines,”

World Wide Web, vol. 9, no. 4, pp. 369–395, 2006.

T. Fagni, R. Perego, F. Silvestri, and S. Orlando, “Boosting the performance of web search engines:

Caching and prefetching query results by exploiting historical usage data,” ACM Transactions on

Information Systems (TOIS), vol. 24, no. 1, pp. 51–78, 2006.

R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri, “The

impact of caching on search engines,” in Proceedings of the 30th annual international ACM SIGIR

conference on Research and development in information retrieval, pp. 183–190, ACM, 2007.

R. Lempel and S. Moran, “Predictive caching and prefetching of query results in search engines,”

in Proceedings of the 12th international conference on World Wide Web, pp. 19–28, ACM, 2003.

G. Skobeltsyn, F. Junqueira, V. Plachouras, and R. Baeza-Yates, “Resin: a combination of results

caching and index pruning for high-performance web search engines,” in Proceedings of the

st annual international ACM SIGIR conference on Research and development in information

retrieval, pp. 131–138, ACM, 2008.

Q. Gan and T. Suel, “Improved techniques for result caching in web search engines,” in Proceedings

of the 18th international conference on World wide web, pp. 431–440, ACM, 2009.

B. Cambazoglu, F. Junqueira, V. Plachouras, S. Banachowski, B. Cui, S. Lim, and B. Bridge, “A

refreshing perspective of search engine caching,” in Proceedings of the 19th international conference

on World wide web, pp. 181–190, ACM, 2010.

R. Blanco, E. Bortnikov, F. Junqueira, R. Lempel, L. Telloli, and H. Zaragoza, “Caching search

engine results over incremental indices,” in Proceeding of the 33rd international ACM SIGIR

conference on Research and development in information retrieval, SIGIR ’10, (New York, NY,

USA), pp. 82–89, ACM, 2010.

S. Chaudhuri and R. Kaushik, “Extending autocompletion to tolerate errors,” SIGMOD, 2009.

H. Duan and B. Hsu, “Online spelling correction for query completion,” WWW, 2011.

B.-J. P. Hsu and G. Ottaviano, “Space-efficient data structures for top-k completion,” in Proceed-

ings of the 22nd international conference on World Wide Web, pp. 583–594, International World

Wide Web Conferences Steering Committee, 2013.

D. Kastrinakis and Y. Tzitzikas, “Advancing search query autocompletion services with more

and better suggestions,” in Proceedings of the 10th international conference on Web engineering,

pp. 35–49, Springer, 2010.

C. Silverstein, H. Marais, M. Henzinger, and M. Moricz, “Analysis of a very large web search

engine query log,” SIGIR Forum, vol. 33, pp. 6–12, September 1999.

S. Ding, J. Attenberg, R. Baeza-Yates, and T. Suel, “Batch query processing for web search

engines,” in Proceedings of the fourth ACM international conference on Web search and data

mining, pp. 137–146, ACM, 2011.

Y. Xie and D. O’Hallaron, “Locality in search engine queries and its implications for caching,” in

INFOCOM 2002. Twenty-First Annual Joint Conference of the IEEE Computer and Communi-

cations Societies. Proceedings. IEEE, vol. 3, 2002.

J. Teevan, E. Adar, R. Jones, and M. Potts, “Information re-retrieval: repeat queries in yahoo’s

logs,” in Proceedings of the 30th annual international ACM SIGIR conference on Research and

development in information retrieval, pp. 151–158, ACM, 2007.

R. S. Boyer and J. S. Moore, “A fast string searching algorithm,” Communications of the ACM,

vol. 20, no. 10, pp. 762–772, 1977.

S. Cucerzan and E. Brill, “Spelling correction as an iterative process that exploits the collective

knowledge of web users,” in Proceedings of EMNLP, vol. 4, pp. 293–300, 2004.

A. Broder, P. Ciccolo, E. Gabrilovich, V. Josifovski, D. Metzler, L. Riedel, and J. Yuan, “Online

expansion of rare queries for sponsored search,” in Proceedings of the 18th international conference

on World wide web, pp. 511–520, ACM, 2009.

G. Navarro, “A Guided Tour to Approximate String Matching,” ACM computing surveys (CSUR),

vol. 33, no. 1, pp. 31–88, 2001.

R. Fagin, A. Lotem, and M. Naor, “Optimal aggregation algorithms for middleware,” in Pro-

ceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database

systems, pp. 102–113, ACM, 2001.

V. Cardellini, M. Colajanni, and P. Yu, “Dynamic load balancing on web-server systems,” Internet

Computing, IEEE, vol. 3, no. 3, pp. 28 –39, 1999.

V. Ungureanu, B. Melamed, and M. Katehakis, “Effective load balancing for cluster-based servers

employing job preemption,” Performance Evaluation, vol. 65, no. 8, pp. 606–622, 2008.

T. Chieu, A. Mohindra, A. Karve, and A. Segal, “Dynamic scaling of web applications in a

virtualized cloud computing environment,” in e-Business Engineering, 2009. ICEBE ’09. IEEE

International Conference on, pp. 281–286, oct. 2009.

R. Grossi and G. Ottaviano, “Fast compressed tries through path decompositions,” CoRR,

vol. abs/1111.5220, 2011.

R. Moore, E. Churchill, and R. Kantamneni, “Three sequential positions of query repair in interactions

with internet search engines,” in Proceedings of the ACM 2011 conference on Computer

supported cooperative work, pp. 415–424, ACM, 2011.

G. Buscher, A. Dengel, R. Biedert, and L. V. Elst, “Attentive documents: Eye tracking as implicit

feedback for information retrieval and beyond,” ACM Trans. Interact. Intell. Syst., vol. 1, no. 2,

L. Chen and H. Tsoi, “Users’ decision behavior in recommender interfaces: Impact of layout

design,” RecSys’11 Workshop on Human Decision Making in Recommender Systems, 2011.

R. Baeza-Yates, B. Ribeiro-Neto, et al., Modern information retrieval, vol. 463. ACM press New

York, 1999.

Downloads

Published

2015-03-06

How to Cite

FAFALIOS, P. ., & TZITZIKAS, Y. . (2015). TYPE-AHEAD EXPLORATORY SEARCH THROUGH TYPO AND WORD ORDER TOLERANT AUTOCOMPLETION. Journal of Web Engineering, 14(1-2), 080–116. Retrieved from https://journals.riverpublishers.com/index.php/JWE/article/view/3893

Issue

Section

Articles