SPARQL Optimization Using Re-ordering Joining Patterns with Surrogate Key Concept and Subset Patterns

Authors

  • Rupal Gupta 1) USIC&T, Guru Gobind Singh Indraprastha University, Dwarka, Delhi, 110078, India 2) College of Computing Sciences and IT, Teerthanker Mahaveer University, Moradabad, Uttar Pradesh, 244001, India
  • Sanjay Kumar Malik USIC&T, Guru Gobind Singh Indraprastha University, Dwarka, Delhi, 110078, India

DOI:

https://doi.org/10.13052/jwe1540-9589.2334

Keywords:

SPARQL, RDF, optimization, indexing, reordering, meta-heuristics, triple patterns

Abstract

Semantic web data resides on the web in the form of knowledge graphs known as RDF graphs and searching around the web has been always a crucial task. For the data retrieval of RDF data of the semantic web, SPARQL query language has been used which in turn is based on triple patterns and joins. Optimization of SPARQL query has been a problematic concern for decades due to the large amount of triple patterns associated with RDF data. Although several researchers have put a lot of effort into the optimization of SPARQL query, it is difficult to understand the concept from scratch due to its diversified nature. This paper analyses various optimization techniques for the SPARQL query used with the semantic web to process knowledge graphs. These techniques include join-based, heuristic-based, rule-based, and indexing-based approaches for optimization. This paper will help researchers in this domain to easily get into the core concept of SPARQL execution along with various optimization approaches used for query processing, which can help in various other domains like linked open data and information retrieval. In this paper, an optimization algorithm HSOA (hybrid SPARQL optimization algorithm) has been proposed, which comprises the features of index-based, cost-based, and triple reordering-based optimization approaches. The proposed hybrid algorithm has been designed specifically for n-triple RDF data, which comprises subset patterns, and surrogate key concepts. The results produced by the proposed algorithm are encouraging and have also been tested and compared with the benchmark dataset and SPARQL queries like LUBM, BSBM, and SP2Bench.

Downloads

Download data is not yet available.

Author Biographies

Rupal Gupta, 1) USIC&T, Guru Gobind Singh Indraprastha University, Dwarka, Delhi, 110078, India 2) College of Computing Sciences and IT, Teerthanker Mahaveer University, Moradabad, Uttar Pradesh, 244001, India

Rupal Gupta is a Research Scholar at USIC&T, Guru Gobind Singh Indraprastha University, Delhi, India. His areas of interest are semantic web, SPARQL query processing and optimization, big data, and data mining. He received his Master’s, MCA from UPTU, Lucknow, and M.Tech (IT) from USIT, GGSIPU, New Delhi. He has published papers in various conferences and peer-reviewed journals indexed in SCOPUS and Web of Science. He is currently working as an Assistant Professor at Teerthanker Mahaveer University, Moradabad, and has more than 16 years of teaching experience.

Sanjay Kumar Malik, USIC&T, Guru Gobind Singh Indraprastha University, Dwarka, Delhi, 110078, India

Sanjay Kumar Malik completed his Ph.D. in the area of Semantic Web from USIC&T, GGSIP University, Delhi. He is currently working as a Professor in the University School of Information, Communication and Technology, GGSIP University. He has more than 20 years of industry and academic experience in India and abroad (Dubai and USA). His areas of research interest are semantic web and web technologies. He has several research papers published in reputed international conferences (India/abroad) and journals. He has been session chair for several international IEEE/Springer conferences and was honored with the third best researcher award in 2011 by GGSIP University for his research contributions.

References

Guo, J., and Wang, Y.: (2022) RDF Graph Summarization Based on Node Characteristic and Centrality. Journal of Web Engineering, pp. 2073–2094.

G. Koutitas, P. Demestichas, (2009) ‘A review of energy efficiency in telecommunication networks’, Proc. In Telecomm. Forum (TELFOR), pp. 1–4, Serbia, Nov.

Gartner Report, Financial Times, (2007).

I. Cerutti, L. Valcarenghi, P. Castoldi, (2009) ‘Designing power-efficient WDM ring networks’, ICST Int. Conf. on Networks for Grid Applic., Athens, 2009.

W. Vereecken, et al., (2009) ‘Energy Efficiency in thin client solutions’, ICST Int. Conf. on Networks for Grid Applic., Athens.

J. Haas, T. Pierce, E. Schutter, (2009) ‘Datacenter design guide’, whitepaper, the greengrid.

Kalayci, E. G., Kalayci, T. E., and Birant, D. (2015). An ant colony optimization approach for optimizing SPARQL queries by reordering triple patterns. Information Systems, 50, 51–68.

Maillot, P., Corby, O., Faron, C., Gandon, F., and Michel, F. (2023). IndeGx: A model and a framework for indexing RDF knowledge graphs with SPARQL-based test suits. Journal of Web Semantics, 100775.

Ntioudis, D., Masa, P., Karakostas, A., Meditskos, G., Vrochidis, S., and Kompatsiaris, I. (2022). Ontology-Based Personalized Job Recommendation Framework for Migrants and Refugees. Big Data and Cognitive Computing, 6(4), 120.

Guo, Y., Pan, Z., and Heflin, J. (2005). LUBM: A benchmark for OWL knowledge base systems. Journal of Web Semantics, 3(2–3), 158–182.

S. Groppe, D. Heinrich, C. Blochwitz, T. Pionteck, (2016) “Constructing Large Scale Semantic Web Indices for the Six RDF Collation Orders”, Open Journal of Big Data (OJDB), Volume-2, Issue-1, RonPub.

M.D. Nguyen, M.S. Lee, S. Oh and G.C. Fox, (2014) “SPARQL Query Optimization for Structural Indexed RDF Data”.

Buwen Wu, Yongluan Zhou, Hai Jin and Amol Deshpande, (2017) “Parallel SPARQL Query Optimization”, IEEE 33rd International Conference on Data Engineering (ICDE).

T. Chawla, G. Singh, E.S. Pilli, (2017) “A Shortest Path Approach to SPARQL Chain Query Optimization”. International Conference on Advances in Computing, Communications and Informatics (ICACCI).

Papailiou, N., Konstantinou, I., Tsoumakos, D., Karras, P., and Koziris, N. (2013, October). H 2 RDF+: High-performance distributed joins over large-scale RDF graphs. In 2013 IEEE International conference on big data (pp. 255–263). IEEE.

Hyunsuk Oh, Sejin Chun, Sungkwang Eom, Kyong-Ho Lee, (2015) “Job-Optimized Map-Side Join Processing using MapReduce and HBase with Abstract RDF Data”, IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

Ge, N., Qin, Z., Peng, P., Li, M., Zou, L., and Li, K. (2022). A cost-driven top-K queries optimization approach on federated RDF systems. IEEE Transactions on Big Data.

Li, M., Peng, P., Tian, Z., Qin, Z., Huang, Z., and Liu, Y. (2022). Optimizing Keyword Search Over Federated RDF Systems. IEEE Transactions on Big Data.

Peng, P., Ge, Q., Zou, L., Özsu, M. T., Xu, Z., and Zhao, D. (2019). Optimizing multi-query evaluation in federated RDF systems. IEEE Transactions on Knowledge and Data Engineering, 33(4), 1692–1707.

Jose, R. T., and Poulose, S. L. (2019). Semantic Web Query Join Optimization Using Modified Grey Wolf Optimization Algorithm. International Journal of Intelligent Engineering & Systems, 12(5).

Dhiman, G., and Kumar, V. (2018). Emperor penguin optimizer: A bio-inspired algorithm for engineering problems. Knowledge-Based Systems, 159, 20–50.

Dhiman, G., and Kumar, V. (2019). Seagull optimization algorithm: Theory and its applications for large-scale industrial engineering problems. Knowledge-based systems, 165, 169–196.

Dhiman, G., and Kumar, V. (2017). Spotted hyena optimizer: a novel bio-inspired based metaheuristic technique for engineering applications. Advances in Engineering Software, 114, 48–70.

Kaur, S., Awasthi, L. K., Sangal, A. L., and Dhiman, G. (2020). Tunicate Swarm Algorithm: A new bio-inspired based metaheuristic paradigm for global optimization. Engineering Applications of Artificial Intelligence, 90, 103541.

Dhiman, G., Oliva, D., Kaur, A., Singh, K. K., Vimal, S., Sharma, A., and Cengiz, K. (2021). BEPO: A novel binary emperor penguin optimizer for automatic feature selection. Knowledge-Based Systems, 211, 106560.

Dehghani, M., Montazeri, Z., Givi, H., Guerrero, J. M., and Dhiman, G. (2020). Darts game optimizer: A new optimization technique based on darts game. International Journal of Intelligent Engineering and Systems, 13(5), 286–294.

Dehghani, M., Montazeri, Z., Dehghani, A., Ramirez-Mendoza, R. A., Samet, H., Guerrero, J. M., and Dhiman, G. (2020). MLO: Multi Leader Optimizer. International Journal of Intelligent Engineering & Systems, 13(6).

Dehghani, M., Montazeri, Z., Malik, O. P., Dhiman, G., and Kumar, V. (2019). BOSA: binary orientation search algorithm. International Journal of Innovative Technology and Exploring Engineering, 9(1), 5306–5310.

Dhiman, G., and Kaur, A. (2019). STOA: a bio-inspired based optimization algorithm for industrial engineering problems. Engineering Applications of Artificial Intelligence, 82, 148–174.

Dhiman, G. (2021). ESA: a hybrid bio-inspired metaheuristic optimization approach for engineering problems. Engineering with Computers, 37, 323–353.

Dhiman, G., Garg, M., Nagar, A., Kumar, V., and Dehghani, M. (2021). A novel algorithm for global optimization: rat swarm optimizer. Journal of Ambient Intelligence and Humanized Computing, 12, 8457–8482.

Chawla, T. (2023) “Storage and Query Processing Architectures for RDF Data.”, In Encyclopedia of Data Science and Machine Learning (pp. 298–313). IGI Global.

Kelwin Fernandes, Jaime S. Cardoso, and Jessica Fernandes. (2017) ’Transfer Learning with Partial Observability Applied to Cervical Cancer Screening.’ Iberian Conference on Pattern Recognition and Image Analysis. Springer International Publishing.

Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Thapar, P., and Sharma, L. S. (2022). Implementing SPARQL-based Prefiltering on Jena Fuseki TDB store to reduce the semantic web services search space. In Evolutionary Computing and Mobile Sustainable Networks: Proceedings of ICECMSN 2021 (pp. 319–333). Singapore: Springer Singapore.

Taelman, R., Vander Sande, M., and Verborgh, R. (2019). Bridges between GraphQL and RDF. In W3C Workshop on Web Standardization for Graph Data. W3C.

Leeka, J., &Bedathur, S. (2017). Indexing and query processing in RDF quad-stores (Doctoral dissertation, IIIT-Delhi).

Lin, X., and Jiang, D. (2022). A Two-Phase Method for Optimization of the SPARQL Query. Journal of Sensors, 2022.

Hassan, M., and Bansal, S. (2023). S3QLRDF: distributed SPARQL query processing using Apache Spark—a comparative performance study. Distributed and Parallel Databases, 1–41.

Albahli, S. (2019). Efficient distributed SPARQL queries on Apache Spark. International Journal of Advanced Computer Science and Applications, 10(8).

Ferrada, S., Bustos, B., and Hogan, A. (2022). Similarity Joins and Clustering for SPARQL. Semantic Web Journal IOS press.

Schätzle, A., Przyjaciel-Zablocki, M., Skilevic, S., and Lausen, G. (2015). S2RDF: RDF querying with SPARQL on spark. arXiv preprint arXiv:1512.07021.

Grobe, M. (2009, October). Rdf, jena, sparql and the semantic web’. In Proceedings of the 37th annual ACM SIGUCCS fall conference: communication and collaboration (pp. 131–138).

Lehmann, J., et al. (2015). Dbpedia–a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web, 6(2), 167–195.

Palar, P. S., Liem, R. P., Zuhal, L. R., and Shimoyama, K. (2019, July). On the use of surrogate models in engineering design optimization and exploration: The key issues. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (pp. 1592–1602).

Kalampokis, E., Nikolov, A., Haase, P., Cyganiak, R., Stasiewicz, A., Karamanou, A.et,al,. (2014, October). Exploiting Linked Data Cubes with OpenCube Toolkit. In ISWC (Posters & Demos) (pp. 137–140).

Stardog, an enterprise Knowledge Graph platform – https://www.stardog.com/.

Paradzikovic, P., Hoch, R., and Kaindl, H. (2022). Assigning Systems to Test Environments Through Ontological Reasoning. In Towards a Knowledge-Aware AI (pp. 75–89). IOS Press.

Bizer, C., and Schultz, A. (2009). The Berlin SPARQL benchmark. International Journal on Semantic Web and Information Systems (IJSWIS), 5(2), 1–24.

Schmidt, M., Hornung, T., Lausen, G., and Pinkel, C. (2009, March). SP^2Bench: a SPARQL performance benchmark. In 2009 IEEE 25th International Conference on Data Engineering (pp. 222–233). IEEE.

Downloads

Published

2024-05-25

How to Cite

Gupta, R., & Malik, S. K. (2024). SPARQL Optimization Using Re-ordering Joining Patterns with Surrogate Key Concept and Subset Patterns. Journal of Web Engineering, 23(03), 393–430. https://doi.org/10.13052/jwe1540-9589.2334

Issue

Section

Articles