RELAXATION OF KEYWORD PATTERN GRAPHS ON RDF DATA

Authors

  • ANANYA DASS Computer Science Department, New Jersey Institute of Technology 150 Bleeker St, Newark, New Jersey 07102, USA
  • CEM AKSOY Computer Science Department, New Jersey Institute of Technology 150 Bleeker St, Newark, New Jersey 07102, USA
  • AGGELIKI DIMITRIOU School of Electrical and Computer Engineering, National Technical University of Athens Heroon Polytechniou 9, Athens, 15780, Greece
  • DIMITRI THEODORATOS Computer Science Department, New Jersey Institute of Technology 150 Bleeker St, Newark, New Jersey 07102, USA

Keywords:

RDF Data, Semantic Web, Keyword Search, Pattern Graph Relaxation

Abstract

One of the facets of the data explosion in recent years is the growing of the repositories of RDF Data on the Web. Keyword search is a popular technique for querying repositories of RDF graph data. Recently, a number of approaches leverage a structural summary of the graph data to address the typical keyword search related problems of: (a) identifying relevant results among a multitude of candidates, and (b) performance scalability. These approaches compute queries (pattern graphs) corresponding to alternative interpretations of the keyword query and the user selects one that matches her intention to be evaluated against the data. Though promising, these approaches suffer from a drawback: because summaries are approximate representations of the data, they might return empty answers or miss results which are relevant to the user intent. In this paper, we present a novel approach which combines the use of the structural summary and the user feedback with a relaxation technique for pattern graphs. We leverage pattern graph homomorphisms to define relaxed pattern graphs that are able to extract more results potentially of interest to the user. We introduce an operation on pattern graphs and we prove that it is complete, that is, it can produce all relaxed pattern graphs. To guarantee that the result pattern graphs are as close to the initial pattern graph as possible, we devise different metrics to measure the degree of relaxation of a pattern graph. We design an algorithm that computes relaxed pattern graphs with non-empty answers in relaxation order. To improve the successive computation of relaxed pattern graphs, we suggest subquery caching and multiquery optimization techniques adapted to the context of this computation. Finally, we run experiments on different real datasets which demonstrate the effectiveness of our ranking of relaxed pattern graphs, and the efficiency of our system and optimization techniques in computing relaxed pattern graphs and their answers.

Downloads

Download data is not yet available.

References

Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti, and Shashank Sudarshan.

Keyword searching and browsing in databases using BANKS. In ICDE, pages 431–440, 2002.

Varun Kacholia, Shashank Pandit, Soumen Chakrabarti, S Sudarshan, Rushi Desai, and Hrishikesh

Karambelkar. Bidirectional expansion for keyword search on graph databases. In VLDB, pages

–516, 2005.

Bolin Ding, Jeffrey Xu Yu, Shan Wang, Lu Qin, Xiao Zhang, and Xuemin Lin. Finding top-k

min-cost connected trees in databases. In ICDE, pages 836–845, 2007.

Hao He, Haixun Wang, Jun Yang, and Philip S Yu. Blinks: ranked keyword searches on graphs.

In SIGMOD, pages 305–316, 2007.

Konstantin Golenberg, Benny Kimelfeld, and Yehoshua Sagiv. Keyword proximity search in complex data graphs. In SIGMOD, pages 927–940, 2008.

Guoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong Wang, and Lizhu Zhou. Ease: an effective 3-

in-1 keyword search method for unstructured, semi-structured and structured data. In SIGMOD,

pages 903–914, 2008.

Lu Qin, Jeffrey Xu Yu, Lijun Chang, and Yufei Tao. Querying communities in relational databases.

In ICDE, pages 724–735, 2009.

Mehdi Kargar and Aijun An. Keyword search in graphs: Finding r-cliques. VLDB, pages 681–692,

Lin Guo, Feng Shao, Chavdar Botev, and Jayavel Shanmugasundaram. XRANK: ranked keyword

search over XML documents. In SIGMOD, pages 16–27, 2003.

Haofen Wang, Kang Zhang, Qiaoling Liu, Thanh Tran, and Yong Yu. Q2semantic: A lightweight

keyword interface to semantic search. In ESWC, pages 584–598, 2008.

Thanh Tran, Haofen Wang, Sebastian Rudolph, and Philipp Cimiano. Top-k exploration of query

candidates for efficient keyword search on graph-shaped (RDF) data. ICDE, pages 405–416, 2009.

Shady Elbassuoni, Maya Ramanath, Ralf Schenkel, and Gerhard Weikum. Searching RDF graphs

with sparql and keywords. IEEE Data Eng. Bull., pages 16–24, 2010.

Ziyang Liu and Yi Chen. Processing keyword search on XML: a survey. WWW, pages 671–707,

Ananya Dass, Cem Aksoy, Aggeliki Dimitriou, and Dimitri Theodoratos. Exploiting semantic

result clustering to support keyword search on linked data. In WISE, pages 448–463, 2014.

Ananya Dass, Aggeliki Dimitriou, Cem Aksoy, and Dimitri Theodoratos. Incorporating cohesiveness

into keyword search on linked data. In WISE, pages 47–62. Springer, 2015.

Ananya Dass and Dimitri Theodoratos. Trading off popularity for diversity in the results sets of

keyword queries on linked data. In ICWE 2017, page 18, 2017.

Raghav Kaushik, Philip Bohannon, Jeffrey F. Naughton, and Henry F. Korth. Covering indexes

for branching path queries. In SIGMOD Conference, pages 133–144, 2002.

Roy Goldman and Jennifer Widom. Dataguides: Enabling query formulation and optimization in

semistructured databases. In VLDB, pages 436–445, 1997.

Haizhou Fu, Sidan Gao, and Kemafor Anyanwu. Disambiguating keyword queries on RDF

databases using ”Deep” segmentation. In ICSC, pages 236–243, 2010.

Kaifeng Xu, Junquan Chen, Haofen Wang, and Yong Yu. Hybrid graph based keyword query

interpretation on RDF. In ISWC, 2010.

Bhavana Bharat Dalvi, Meghana Kshirsagar, and S. Sudarshan. Keyword search on external

memory data graphs. PVLDB, 1(1):1189–1204, 2008.

Timos K Sellis. Multiple-query optimization. ACM Transactions on Database Systems (TODS),

(1):23–52, 1988.

Prasan Roy, Srinivasan Seshadri, S Sudarshan, and Siddhesh Bhobe. Efficient and extensible

algorithms for multi query optimization. In ACM SIGMOD Record, volume 29, pages 249–260.

ACM, 2000.

Wangchao Le, Songyun Duan, Anastasios Kementsietsidis, Feifei Li, and Min Wang. Rewriting

queries on SPARQL views. In WWW, pages 655–664, 2011.

Wangchao Le, Anastasios Kementsietsidis, Songyun Duan, and Feifei Li. Scalable multi-query

optimization for sparql. In Data Engineering (ICDE), 2012 IEEE 28th International Conference

on, pages 666–677. IEEE, 2012.

Alon Y Levy, Alberto O Mendelzon, and Yehoshua Sagiv. Answering queries using views. In Proceedings

of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database

systems, pages 95–104. ACM, 1995.

Timos Sellis and Subrata Ghosh. On the multiple-query optimization problem. IEEE Transactions

on Knowledge & Data Engineering, (2):262–266, 1990.

Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. Factorizing YAGO: scalable machine

learning for linked data. In WWW, pages 271–280, 2012.

Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. Modern Information Retrieval - the concepts and technology behind search, Second edition. Pearson Education Ltd., Harlow, England,

Alan Agresti. Analysis of ordinal categorical data, volume 656. John Wiley & Sons, 2010.

Kalervo J¨arvelin and Jaana Kek¨al¨ainen. Cumulated gain-based evaluation of IR techniques. ACM

Trans. Inf. Syst., 20(4):422–446, 2002.

Maurice G Kendall. A new measure of rank correlation. Biometrika, pages 81–93, 1938.

Shady Elbassuoni and Roi Blanco. Keyword search over RDF graphs. In CIKM, pages 237–242,

Wangchao Le, Feifei Li, Anastasios Kementsietsidis, and Songyun Duan. Scalable keyword search

on large RDF data. IEEE Trans. Knowl. Data Eng., 26(11):2774–2788, 2014.

Mengxia Jiang, Yueguo Chen, Jinchuan Chen, and Xiaoyong Du. Interactive predicate suggestion

for keyword search on RDF graphs. In ADMA (2), pages 96–109, 2011.

Xiping Liu, Changxuan Wan, and Lei Chen. Returning clustered results for keyword search on

XML documents. IEEE Trans. Knowl. Data Eng., pages 1811–1825, 2011.

Cem Aksoy, Ananya Dass, Dimitri Theodoratos, and Xiaoying Wu. Clustering query results to

support keyword search on tree data. In WAIM, pages 1–12, 2014.

Lingbo Kong, R´emi Gilleron, and Aur´elien Lemay Mostrare. Retrieving meaningful relaxed tightest

fragments for XML keyword search. In EDBT, pages 815–826, 2009.

Tali Brodianskiy and Sara Cohen. Self-correcting queries for xml. In CIKM, pages 11–20, 2007.

Sihem Amer-Yahia, SungRan Cho, and Divesh Srivastava. Tree pattern relaxation. In EDBT,

pages 496–513, 2002.

Ziyang Liu and Yi Cher. Reasoning and identifying relevant matches for xml keyword search.

Proceedings of the VLDB Endowment, 1(1):921–932, 2008.

Davide Mottin, Alice Marascu, Senjuti Basu Roy, Gautam Das, Themis Palpanas, and Yannis

Velegrakis. A probabilistic optimization framework for the empty-answer problem. Proceedings of

the VLDB Endowment, 6(14):1762–1773, 2013.

Shady Elbassuoni, Maya Ramanath, and Gerhard Weikum. Query relaxation for entityrelationship

search. In ESWC, pages 62–76, 2011.

Shady Elbassuoni, Maya Ramanath, Ralf Schenkel, Marcin Sydow, and Gerhard Weikum.

Language-model-based ranking for queries on rdf-graphs. In CIKM, pages 977–986, 2009.

Ananya Dass, Cem Aksoy, Aggeliki Dimitriou, and Dimitri Theodoratos. Keyword pattern graph

relaxation for selective result space expansion on linked data. In ICWE, pages 287–306, 2015.

Downloads

Published

2017-03-01

Issue

Section

Articles