RELAXATION OF KEYWORD PATTERN GRAPHS ON RDF DATA
Keywords:
RDF Data, Semantic Web, Keyword Search, Pattern Graph RelaxationAbstract
One of the facets of the data explosion in recent years is the growing of the repositories of RDF Data on the Web. Keyword search is a popular technique for querying repositories of RDF graph data. Recently, a number of approaches leverage a structural summary of the graph data to address the typical keyword search related problems of: (a) identifying relevant results among a multitude of candidates, and (b) performance scalability. These approaches compute queries (pattern graphs) corresponding to alternative interpretations of the keyword query and the user selects one that matches her intention to be evaluated against the data. Though promising, these approaches suffer from a drawback: because summaries are approximate representations of the data, they might return empty answers or miss results which are relevant to the user intent. In this paper, we present a novel approach which combines the use of the structural summary and the user feedback with a relaxation technique for pattern graphs. We leverage pattern graph homomorphisms to define relaxed pattern graphs that are able to extract more results potentially of interest to the user. We introduce an operation on pattern graphs and we prove that it is complete, that is, it can produce all relaxed pattern graphs. To guarantee that the result pattern graphs are as close to the initial pattern graph as possible, we devise different metrics to measure the degree of relaxation of a pattern graph. We design an algorithm that computes relaxed pattern graphs with non-empty answers in relaxation order. To improve the successive computation of relaxed pattern graphs, we suggest subquery caching and multiquery optimization techniques adapted to the context of this computation. Finally, we run experiments on different real datasets which demonstrate the effectiveness of our ranking of relaxed pattern graphs, and the efficiency of our system and optimization techniques in computing relaxed pattern graphs and their answers.
Downloads
References
Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti, and Shashank Sudarshan.
Keyword searching and browsing in databases using BANKS. In ICDE, pages 431–440, 2002.
Varun Kacholia, Shashank Pandit, Soumen Chakrabarti, S Sudarshan, Rushi Desai, and Hrishikesh
Karambelkar. Bidirectional expansion for keyword search on graph databases. In VLDB, pages
–516, 2005.
Bolin Ding, Jeffrey Xu Yu, Shan Wang, Lu Qin, Xiao Zhang, and Xuemin Lin. Finding top-k
min-cost connected trees in databases. In ICDE, pages 836–845, 2007.
Hao He, Haixun Wang, Jun Yang, and Philip S Yu. Blinks: ranked keyword searches on graphs.
In SIGMOD, pages 305–316, 2007.
Konstantin Golenberg, Benny Kimelfeld, and Yehoshua Sagiv. Keyword proximity search in complex data graphs. In SIGMOD, pages 927–940, 2008.
Guoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong Wang, and Lizhu Zhou. Ease: an effective 3-
in-1 keyword search method for unstructured, semi-structured and structured data. In SIGMOD,
pages 903–914, 2008.
Lu Qin, Jeffrey Xu Yu, Lijun Chang, and Yufei Tao. Querying communities in relational databases.
In ICDE, pages 724–735, 2009.
Mehdi Kargar and Aijun An. Keyword search in graphs: Finding r-cliques. VLDB, pages 681–692,
Lin Guo, Feng Shao, Chavdar Botev, and Jayavel Shanmugasundaram. XRANK: ranked keyword
search over XML documents. In SIGMOD, pages 16–27, 2003.
Haofen Wang, Kang Zhang, Qiaoling Liu, Thanh Tran, and Yong Yu. Q2semantic: A lightweight
keyword interface to semantic search. In ESWC, pages 584–598, 2008.
Thanh Tran, Haofen Wang, Sebastian Rudolph, and Philipp Cimiano. Top-k exploration of query
candidates for efficient keyword search on graph-shaped (RDF) data. ICDE, pages 405–416, 2009.
Shady Elbassuoni, Maya Ramanath, Ralf Schenkel, and Gerhard Weikum. Searching RDF graphs
with sparql and keywords. IEEE Data Eng. Bull., pages 16–24, 2010.
Ziyang Liu and Yi Chen. Processing keyword search on XML: a survey. WWW, pages 671–707,
Ananya Dass, Cem Aksoy, Aggeliki Dimitriou, and Dimitri Theodoratos. Exploiting semantic
result clustering to support keyword search on linked data. In WISE, pages 448–463, 2014.
Ananya Dass, Aggeliki Dimitriou, Cem Aksoy, and Dimitri Theodoratos. Incorporating cohesiveness
into keyword search on linked data. In WISE, pages 47–62. Springer, 2015.
Ananya Dass and Dimitri Theodoratos. Trading off popularity for diversity in the results sets of
keyword queries on linked data. In ICWE 2017, page 18, 2017.
Raghav Kaushik, Philip Bohannon, Jeffrey F. Naughton, and Henry F. Korth. Covering indexes
for branching path queries. In SIGMOD Conference, pages 133–144, 2002.
Roy Goldman and Jennifer Widom. Dataguides: Enabling query formulation and optimization in
semistructured databases. In VLDB, pages 436–445, 1997.
Haizhou Fu, Sidan Gao, and Kemafor Anyanwu. Disambiguating keyword queries on RDF
databases using ”Deep” segmentation. In ICSC, pages 236–243, 2010.
Kaifeng Xu, Junquan Chen, Haofen Wang, and Yong Yu. Hybrid graph based keyword query
interpretation on RDF. In ISWC, 2010.
Bhavana Bharat Dalvi, Meghana Kshirsagar, and S. Sudarshan. Keyword search on external
memory data graphs. PVLDB, 1(1):1189–1204, 2008.
Timos K Sellis. Multiple-query optimization. ACM Transactions on Database Systems (TODS),
(1):23–52, 1988.
Prasan Roy, Srinivasan Seshadri, S Sudarshan, and Siddhesh Bhobe. Efficient and extensible
algorithms for multi query optimization. In ACM SIGMOD Record, volume 29, pages 249–260.
ACM, 2000.
Wangchao Le, Songyun Duan, Anastasios Kementsietsidis, Feifei Li, and Min Wang. Rewriting
queries on SPARQL views. In WWW, pages 655–664, 2011.
Wangchao Le, Anastasios Kementsietsidis, Songyun Duan, and Feifei Li. Scalable multi-query
optimization for sparql. In Data Engineering (ICDE), 2012 IEEE 28th International Conference
on, pages 666–677. IEEE, 2012.
Alon Y Levy, Alberto O Mendelzon, and Yehoshua Sagiv. Answering queries using views. In Proceedings
of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database
systems, pages 95–104. ACM, 1995.
Timos Sellis and Subrata Ghosh. On the multiple-query optimization problem. IEEE Transactions
on Knowledge & Data Engineering, (2):262–266, 1990.
Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. Factorizing YAGO: scalable machine
learning for linked data. In WWW, pages 271–280, 2012.
Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. Modern Information Retrieval - the concepts and technology behind search, Second edition. Pearson Education Ltd., Harlow, England,
Alan Agresti. Analysis of ordinal categorical data, volume 656. John Wiley & Sons, 2010.
Kalervo J¨arvelin and Jaana Kek¨al¨ainen. Cumulated gain-based evaluation of IR techniques. ACM
Trans. Inf. Syst., 20(4):422–446, 2002.
Maurice G Kendall. A new measure of rank correlation. Biometrika, pages 81–93, 1938.
Shady Elbassuoni and Roi Blanco. Keyword search over RDF graphs. In CIKM, pages 237–242,
Wangchao Le, Feifei Li, Anastasios Kementsietsidis, and Songyun Duan. Scalable keyword search
on large RDF data. IEEE Trans. Knowl. Data Eng., 26(11):2774–2788, 2014.
Mengxia Jiang, Yueguo Chen, Jinchuan Chen, and Xiaoyong Du. Interactive predicate suggestion
for keyword search on RDF graphs. In ADMA (2), pages 96–109, 2011.
Xiping Liu, Changxuan Wan, and Lei Chen. Returning clustered results for keyword search on
XML documents. IEEE Trans. Knowl. Data Eng., pages 1811–1825, 2011.
Cem Aksoy, Ananya Dass, Dimitri Theodoratos, and Xiaoying Wu. Clustering query results to
support keyword search on tree data. In WAIM, pages 1–12, 2014.
Lingbo Kong, R´emi Gilleron, and Aur´elien Lemay Mostrare. Retrieving meaningful relaxed tightest
fragments for XML keyword search. In EDBT, pages 815–826, 2009.
Tali Brodianskiy and Sara Cohen. Self-correcting queries for xml. In CIKM, pages 11–20, 2007.
Sihem Amer-Yahia, SungRan Cho, and Divesh Srivastava. Tree pattern relaxation. In EDBT,
pages 496–513, 2002.
Ziyang Liu and Yi Cher. Reasoning and identifying relevant matches for xml keyword search.
Proceedings of the VLDB Endowment, 1(1):921–932, 2008.
Davide Mottin, Alice Marascu, Senjuti Basu Roy, Gautam Das, Themis Palpanas, and Yannis
Velegrakis. A probabilistic optimization framework for the empty-answer problem. Proceedings of
the VLDB Endowment, 6(14):1762–1773, 2013.
Shady Elbassuoni, Maya Ramanath, and Gerhard Weikum. Query relaxation for entityrelationship
search. In ESWC, pages 62–76, 2011.
Shady Elbassuoni, Maya Ramanath, Ralf Schenkel, Marcin Sydow, and Gerhard Weikum.
Language-model-based ranking for queries on rdf-graphs. In CIKM, pages 977–986, 2009.
Ananya Dass, Cem Aksoy, Aggeliki Dimitriou, and Dimitri Theodoratos. Keyword pattern graph
relaxation for selective result space expansion on linked data. In ICWE, pages 287–306, 2015.