TYPE-AHEAD EXPLORATORY SEARCH THROUGH TYPO AND WORD ORDER TOLERANT AUTOCOMPLETION
Keywords:type-ahead search, instant search, exploratory search, autocompletion, query suggestions, caching
There is an increasing interest on recommending to the user instantly (during typing characters) queries and query results. This is evidenced by the emergence of several systems that offer such functionalities, e.g. Google Instant Search for Web searching or Facebook Search for social searching. In this paper we consider showing more rich rec- ommendations that show several other kinds of supplementary information that provide the user with a better overview of the search space. This supplementary information can be the result of various tasks (e.g. textual clustering or entity mining of the top search results), may have very large size and may cost a lot to be derived. The instant presen- tation of these recommendations (as the user types a query letter-by-letter) helps the user (a) to quickly discover what is popular among other users, (b) to decide fast which (of the suggested) query completions to use, and (c) to decide what hits of the returned answer to inspect. In this paper we focus on making this feasible (scalable) and flexible. Regarding scalability we elaborate on an approach based on precomputed information and we comparatively evaluate various trie-based index structures for making real-time interaction feasible, even if the size of the available memory space is limited. Specifically, we show how with modest hardware (like this of a mobile device) one can provide instant access to large amounts of data. Moreover, we propose and experimentally evaluate an incremental procedure for updating the index. For improving the throughput that can be served we analyze and experimentally evaluate various policies for caching subtries. With regard to flexibility, in order to reduce user’s effort and to increase the exploitation of the precomputed information, we elaborate on how the recommendations can tolerate different word orders and spelling errors, assuming the proposed trie-based index struc- tures. The experimental results revealed that such functionality significantly increases the number of recommendations especially for queries that contain several words. Fi- nally, we propose an algorithm for computing the top-K suggestions that exploits the ranking information in order to reduce the trie traversals. An experimental evaluation proves that the proposed algorithm highly improves the retrieval time.
F. Silvestri, “Mining query logs: Turning search usage data into knowledge,” Foundations and
Trends in Information Retrieval, vol. 4, no. 12, pp. 1–174, 2010.
P. Fafalios and Y. Tzitzikas, “Exploiting available memory and disk for scalable instant overview
search,” in Web Information System Engineering–WISE 2011, pp. 101–115, Springer, 2011.
S. Kopidaki, P. Papadakos, and Y. Tzitzikas, “STC+ and NM-STC: Two novel online results clustering
methods for web searching,” in WISE ’09: Proceedings of the 10th International Conference
on Web Information Systems Engineering, October 2009.
P. Fafalios, I. Kitsos, Y. Marketakis, C. Baldassarre, M. Salampasis, and Y. Tzitzikas, “Web
searching with entity mining at query time,” in Multidisciplinary Information Retrieval, pp. 73–
, Springer, 2012.
P. Fafalios, I. Kitsos, and Y. Tzitzikas, “Scalable, flexible and generic instant overview search,” in
Proceedings of the 21st International Conference on World Wide Web, pp. 333–336, ACM, 2012.
M. K¨aki and A. Aula, “Findex: improving search result use through automatic filtering categories,”
Interacting with Computers, vol. 17, no. 2, pp. 187–206, 2005.
M. K¨aki, “Findex: search result categories help users when document ranking fails,” in Proceedings
of the SIGCHI conference on Human factors in computing systems, pp. 131–140, ACM, 2005.
B. Kules, R. Capra, M. Banta, and T. Sierra, “What do exploratory searchers look at in a faceted
search interface?,” in Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries,
pp. 313–322, ACM, 2009.
M. Wilson et al., “A longitudinal study of exploratory and keyword search,” in Proceedings of the
th ACM/IEEE-CS joint conference on Digital libraries, pp. 52–56, ACM, 2008.
S. Teddy, F. Yap, C. Quek, and E. Lai, “A neurocognitive approach to decision making for the
reconstruction of the metabolic insulin profile of a healthy person,” Handbook on Decision Making,
pp. 497–532, 2010.
A. Spink, B. Jansen, D. Wolfram, and T. Saracevic, “From e-sex to e-commerce: Web search
changes,” Computer, vol. 35, no. 3, pp. 107–109, 2002.
L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank citation ranking: bringing order
to the web.,” 1999.
G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Information
processing & management, vol. 24, no. 5, pp. 513–523, 1988.
E. Segev, Google and the Digital Divide: The Biases of Online Knowledge. Chandos Publishing
L. Vaughan and M. Thelwall, “Search engine coverage bias: evidence and possible causes,” Infor-
mation Processing & Management, vol. 40, no. 4, pp. 693–707, 2004.
D. Bridge and F. Ricci, “Supporting product selection with query editing recommendations,” in
Proceedings of the 2007 ACM conference on Recommender systems, pp. 65–72, ACM, 2007.
S. Ji, G. Li, C. Li, and J. Feng, “Efficient interactive fuzzy keyword search,” in Proceedings of
the 18th international conference on World wide web, WWW ’09, (New York, USA), pp. 371–380,
G. Li, S. Ji, C. Li, and J. Feng, “Efficient type-ahead search on relational data: a tastier approach,”
in Proceedings of the 35th SIGMOD international conference on Management of data, SIGMOD
’09, (New York, NY, USA), pp. 695–706, ACM, 2009.
H. Wu, G. Li, C. Li, and L. Zhou, “Seaform: Search-as-you-type in forms,” in Proceedings of the
th International Conference on Very Large Data Bases, VLDB ’10, 2010.
H. Bast and I. Weber, “Type less, find more: fast autocompletion search with a succinct index,” in
Proceedings of the 29th annual international ACM SIGIR conference on Research and development
in information retrieval, SIGIR ’06, (New York, NY, USA), pp. 364–371, ACM, 2006.
G. Li, J. Feng, and L. Zhou, “Interactive search in xml data,” in Proceedings of the 18th interna-
tional conference on World wide web, WWW ’09, (New York, NY, USA), pp. 1063–1064, ACM,
G. Li, J. Wang, C. Li, and J. Feng, “Supporting efficient top-k queries in type-ahead search,”
in Proceedings of the 35th international ACM SIGIR conference on Research and development in
information retrieval, pp. 355–364, ACM, 2012.
H. Bast and M. Celikik, “Efficient fuzzy search in large text collections,” ACM Transactions on
Information Systems (TOIS), vol. 31, no. 2, p. 10, 2013.
H. Bast and I. Weber, “The completesearch engine: Interactive, efficient, and towards ir& db
integration.,” in CIDR, vol. 7, pp. 88–95, 2007.
E. Markatos, “On caching search engine query results,” Computer Communications, vol. 24, no. 2,
pp. 137–143, 2001.
P. Saraiva, E. Silva de Moura, N. Ziviani, W. Meira, R. Fonseca, and B. Riberio-Neto, “Rankpreserving
two-level caching for scalable search engines,” in Proceedings of the 24th annual inter-
national ACM SIGIR conference on Research and development in information retrieval, pp. 51–58,
X. Long and T. Suel, “Three-level caching for efficient query processing in large web search engines,”
World Wide Web, vol. 9, no. 4, pp. 369–395, 2006.
T. Fagni, R. Perego, F. Silvestri, and S. Orlando, “Boosting the performance of web search engines:
Caching and prefetching query results by exploiting historical usage data,” ACM Transactions on
Information Systems (TOIS), vol. 24, no. 1, pp. 51–78, 2006.
R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri, “The
impact of caching on search engines,” in Proceedings of the 30th annual international ACM SIGIR
conference on Research and development in information retrieval, pp. 183–190, ACM, 2007.
R. Lempel and S. Moran, “Predictive caching and prefetching of query results in search engines,”
in Proceedings of the 12th international conference on World Wide Web, pp. 19–28, ACM, 2003.
G. Skobeltsyn, F. Junqueira, V. Plachouras, and R. Baeza-Yates, “Resin: a combination of results
caching and index pruning for high-performance web search engines,” in Proceedings of the
st annual international ACM SIGIR conference on Research and development in information
retrieval, pp. 131–138, ACM, 2008.
Q. Gan and T. Suel, “Improved techniques for result caching in web search engines,” in Proceedings
of the 18th international conference on World wide web, pp. 431–440, ACM, 2009.
B. Cambazoglu, F. Junqueira, V. Plachouras, S. Banachowski, B. Cui, S. Lim, and B. Bridge, “A
refreshing perspective of search engine caching,” in Proceedings of the 19th international conference
on World wide web, pp. 181–190, ACM, 2010.
R. Blanco, E. Bortnikov, F. Junqueira, R. Lempel, L. Telloli, and H. Zaragoza, “Caching search
engine results over incremental indices,” in Proceeding of the 33rd international ACM SIGIR
conference on Research and development in information retrieval, SIGIR ’10, (New York, NY,
USA), pp. 82–89, ACM, 2010.
S. Chaudhuri and R. Kaushik, “Extending autocompletion to tolerate errors,” SIGMOD, 2009.
H. Duan and B. Hsu, “Online spelling correction for query completion,” WWW, 2011.
B.-J. P. Hsu and G. Ottaviano, “Space-efficient data structures for top-k completion,” in Proceed-
ings of the 22nd international conference on World Wide Web, pp. 583–594, International World
Wide Web Conferences Steering Committee, 2013.
D. Kastrinakis and Y. Tzitzikas, “Advancing search query autocompletion services with more
and better suggestions,” in Proceedings of the 10th international conference on Web engineering,
pp. 35–49, Springer, 2010.
C. Silverstein, H. Marais, M. Henzinger, and M. Moricz, “Analysis of a very large web search
engine query log,” SIGIR Forum, vol. 33, pp. 6–12, September 1999.
S. Ding, J. Attenberg, R. Baeza-Yates, and T. Suel, “Batch query processing for web search
engines,” in Proceedings of the fourth ACM international conference on Web search and data
mining, pp. 137–146, ACM, 2011.
Y. Xie and D. O’Hallaron, “Locality in search engine queries and its implications for caching,” in
INFOCOM 2002. Twenty-First Annual Joint Conference of the IEEE Computer and Communi-
cations Societies. Proceedings. IEEE, vol. 3, 2002.
J. Teevan, E. Adar, R. Jones, and M. Potts, “Information re-retrieval: repeat queries in yahoo’s
logs,” in Proceedings of the 30th annual international ACM SIGIR conference on Research and
development in information retrieval, pp. 151–158, ACM, 2007.
R. S. Boyer and J. S. Moore, “A fast string searching algorithm,” Communications of the ACM,
vol. 20, no. 10, pp. 762–772, 1977.
S. Cucerzan and E. Brill, “Spelling correction as an iterative process that exploits the collective
knowledge of web users,” in Proceedings of EMNLP, vol. 4, pp. 293–300, 2004.
A. Broder, P. Ciccolo, E. Gabrilovich, V. Josifovski, D. Metzler, L. Riedel, and J. Yuan, “Online
expansion of rare queries for sponsored search,” in Proceedings of the 18th international conference
on World wide web, pp. 511–520, ACM, 2009.
G. Navarro, “A Guided Tour to Approximate String Matching,” ACM computing surveys (CSUR),
vol. 33, no. 1, pp. 31–88, 2001.
R. Fagin, A. Lotem, and M. Naor, “Optimal aggregation algorithms for middleware,” in Pro-
ceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database
systems, pp. 102–113, ACM, 2001.
V. Cardellini, M. Colajanni, and P. Yu, “Dynamic load balancing on web-server systems,” Internet
Computing, IEEE, vol. 3, no. 3, pp. 28 –39, 1999.
V. Ungureanu, B. Melamed, and M. Katehakis, “Effective load balancing for cluster-based servers
employing job preemption,” Performance Evaluation, vol. 65, no. 8, pp. 606–622, 2008.
T. Chieu, A. Mohindra, A. Karve, and A. Segal, “Dynamic scaling of web applications in a
virtualized cloud computing environment,” in e-Business Engineering, 2009. ICEBE ’09. IEEE
International Conference on, pp. 281–286, oct. 2009.
R. Grossi and G. Ottaviano, “Fast compressed tries through path decompositions,” CoRR,
vol. abs/1111.5220, 2011.
R. Moore, E. Churchill, and R. Kantamneni, “Three sequential positions of query repair in interactions
with internet search engines,” in Proceedings of the ACM 2011 conference on Computer
supported cooperative work, pp. 415–424, ACM, 2011.
G. Buscher, A. Dengel, R. Biedert, and L. V. Elst, “Attentive documents: Eye tracking as implicit
feedback for information retrieval and beyond,” ACM Trans. Interact. Intell. Syst., vol. 1, no. 2,
L. Chen and H. Tsoi, “Users’ decision behavior in recommender interfaces: Impact of layout
design,” RecSys’11 Workshop on Human Decision Making in Recommender Systems, 2011.
R. Baeza-Yates, B. Ribeiro-Neto, et al., Modern information retrieval, vol. 463. ACM press New