Efficient Retrieval of Data Using Semantic Search Engine Based on NLP and RDF

Authors

  • Usha Yadav National Institute of Fashion Technology, Jodhpur, India https://orcid.org/0000-0001-9332-7552
  • Neelam Duhan J.C. Bose University of Science & Technology, YMCA, Faridabad, India

DOI:

https://doi.org/10.13052/jwe1540-9589.2084

Keywords:

Domain Ontology, Semantic Search Engine, SPARQL, Natural Language Processing, RDF

Abstract

With the evolution of Web 3.0, the traditional algorithm of searching Web 2.0 would become obsolete and underperform in retrieving the precise and accurate information from the growing semantic web. It is very reasonable to presume that common users might not possess any understanding of the ontology used in the knowledge base or SPARQL query. Therefore, providing easy access of this enormous knowledge base to all level of users is challenging. The ability for all level of users to effortlessly formulate structure query such as SPARQL is very diverse. In this paper, semantic web based search methodology is proposed which converts user query in natural language into SPARQL query, which could be directed to domain ontology based knowledge base. Each query word is further mapped to the relevant concept or relations in ontology. Score is assigned to each mapping to find out the best possible mapping for the query generation. Mapping with highest score are taken into consideration along with interrogative or other function to finally formulate the user query into SPARQL query. If there is no search result retrieved from the knowledge base, then instead of returning null to the user, the query is further directed to the Web 3.0. The top “k” documents are considered to further converting them into RDF format using Text2Onto tool and the corpus of semantically structured web documents is build. Alongside, semantic crawl agent is used to get <Subject-Predicate-Object> set from the semantic wiki. The Term Frequency Matrix and Co-occurrence Matrix are applied on the corpus following by singular Value decomposition (SVD) to find the results relevant for the user query. The result evaluations proved that the proposed system is efficient in terms of execution time, precision, recall and f-measures.

Downloads

Download data is not yet available.

Author Biographies

Usha Yadav, National Institute of Fashion Technology, Jodhpur, India

Usha Yadav is presently working as an Assistant Professor in National Institute of Fashion Technology, Jodhpur, India and has more than 7 years of working experience. She is also pursuing Ph.D. from J. C. Bose University of Science and Technology, YMCA, Faridabad, India. She received her B.E. in Information Technology in 2009 and M.Tech. in Computer Engineering in 2011. She has published more than 11 research papers in reputed journals and conferences indexed with SCIE, SCOPUS etc. Her areas of interest are semantic web, information retrieval, AR VR, Artificial Intelligence and Internet of Things.

Neelam Duhan, J.C. Bose University of Science & Technology, YMCA, Faridabad, India

Neelam Duhan has an academic work experience of 17 years and currently working as an Associate Professor in Computer Engineering Department at J. C. Bose University of Science and Technology, YMCA, Faridabad. She received her B.Tech. in Computer Science and Engineering, M.Tech. in Computer Engineering and Ph.D. in Computer Engineering in 2002, 2005 and 2011 respectively. She has successfully guided three Ph.Ds and is currently guiding four Ph.D. scholars in the areas of machine learning, semantic web and social networks. She has guided more than 30 M.Tech dissertations. She has published more than 75 research papers in reputed journals and conferences. Her areas of interest are databases, data analytics, information retrieval and web mining.

References

The Linked Open Data Cloud, https://lod-cloud.net/

Vargas, H., Buil-Aranda, C., Hogan, A., López, C.: RDF Explorer: A Visual SPARQL Query Builder. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). pp. 647–663. Springer (2019)

Bernstein, A., Kaufmann, E., Kaiser, C., Kiefer, C.: Ginseng: A Guided Input Natural Language Search Engine for Querying Ontologies. Jena User Conf. Bristol, UK. (2006)

Kaufmann, E., Bernstein, A., Fischer, L.: NLP-Reduce: A “naïvenaïve” but Domain-independent Natural Language Interface for Querying Ontologies. 4th Eur. Semant. Web Conf. (ESWC). (2007)

Khan, A., Ibrahim, I., Uddin, M.I., Zubair, M., Ahmad, S., Al Firdausi, M.D., Zaindin, M.: Machine Learning Approach for Answer Detection in Discussion Forums: An Application of Big Data Analytics. Sci. Program. 2020, (2020). https://doi.org/10.1155/2020/4621196

Han, L., Finin, T., Joshi, A.: GoRelations: An intuitive query system for DBpedia. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). pp. 334–341. Springer, Berlin, Heidelberg (2012).

Damljanovic, D., Agatonovic, M., Cunningham, H.: Natural language interfaces to ontologies: Combining syntactic analysis and ontology-based lookup through the user interaction. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). pp. 106–120. Springer, Berlin, Heidelberg (2010).

Kasneci, G., Suchanek, F.M., Ifrim, G., Ramanath, M., Weikum, G.: NAGA: Searching and ranking knowledge. In: Proceedings - International Conference on Data Engineering. pp. 953–962 (2008).

Styperek, A., Ciesielczyk, M., Szwabe, A.: SPARQL - Compliant semantic search engine with an intuitive user interface. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). pp. 201–210. Springer Verlag (2014).

Geng, Q., Deng, S., Jia, D., Jin, J.: Cross-domain ontology construction and alignment from online customer product reviews. Inf. Sci. (Ny). 531, 47–67 (2020). https://doi.org/10.1016/j.ins.2020.03.058

Song, S., Huang, W., Sun, Y.: Semantic query graph based SPARQL generation from natural language questions. Cluster Comput. (2017). https://doi.org/10.1007/s10586-017-1332-3

Heibi, I., Peroni, S., Shotton, D.: Enabling text search on SPARQL endpoints through OSCAR. Data Sci. 2, 205–227 (2019). https://doi.org/10.3233/ds-190016

Arenas, M., Grau, B.C., Kharlamov, E., Marciuska, S., Zheleznyakov, D.: Faceted search over ontology-enhanced RDF data. CIKM 2014 – Proc. 2014 ACM Int. Conf. Inf. Knowl. Manag. 939–948 (2014). https://doi.org/10.1145/2661829.2662027

Wang, X., Yang, L., Zhu, Y., Zhan, H., Jin, Y.: Querying Knowledge Graphs with Natural Languages. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). pp. 30–46. Springer (2019).

Wang, C., Xiong, M., Zhou, Q., Yu, Y.: PANTO: A portable natural language interface to ontologies. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). pp. 473–487. Springer Verlag (2007).

Yahya, M., Berberich, K., Elbassuoni, S., Ramanath, M., Tresp, V., Weikum, G.: Natural Language Questions for the Web of Data. Association for Computational Linguistics (2012).

Ferré, S.: SQUALL: A controlled natural language as expressive as SPARQL 1.1. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). pp. 114–125 (2013).

John, P.M., Arockiasamy, S., Thangiah, P.R.J.: A personalised user preference and feature based semantic information retrieval system in semantic web search. Int. J. Grid Util. Comput. 9, 256–267 (2018). https://doi.org/10.1504/IJGUC.2018.093987

Ramzan, B., Bajwa, I.S., Jamil, N., Amin, R.U., Ramzan, S., Mirza, F., Sarwar, N.: An Intelligent Data Analysis for Recommendation Systems Using Machine Learning. Sci. Program. 2019, (2019). https://doi.org/10.1155/2019/5941096

Ramesh, C., Rao, K.V.C., Govardhan, A.: Ontology based web usage mining model. In: Proceedings of the International Conference on Inventive Communication and Computational Technologies, ICICCT 2017. pp. 356–362. Institute of Electrical and Electronics Engineers Inc. (2017).

Yasodha, S., Dhenakaran, S.S.: ONTOPARK: Ontology based page ranking framework using resource description framework. J. Comput. Sci. 10, 1776–1781 (2014). https://doi.org/10.3844/jcssp.2014.1776.1781

Chooralil, V.S., Gopinathan, E.: A Semantic Web query Optimization Using Resource Description Framework. In: Procedia Computer Science. pp. 723–732. Elsevier B.V. (2015).

Guha, R. V., Brickley, D., Macbeth, S.: Schemaorg: Evolution of structured data on the web. Commun. ACM. 59, 44–51 (2016). https://doi.org/10.1145/2844544

Introducing the Knowledge Graph: things, not strings, https://blog.google/products/search/introducing-knowledge-graph-things-not/

Ji, S., Pan, S., Cambria, E., Member, S., Marttinen, P., Yu, P.S., Fellow, L.: A Survey on Knowledge Graphs: Representation, Acquisition and Applications. (2021).

Bansal, R., Jyoti, Bhatia, K.K.: Ontology-based ranking in search engine. In: Advances in Intelligent Systems and Computing. pp. 97–109. Springer Verlag (2018).

Ahamed, B.B., Ramkumar, T.: An intelligent web search framework for performing efficient retrieval of data. Comput. Electr. Eng. 56, 289–299 (2016). https://doi.org/10.1016/j.compeleceng.2016.09.033

Sander, M., Waltinger, U., Roshchin, M., Runkler, T.: Ontology-Based Translation of Natural Language Queries to SPARQL. AAAI Fall Symposia (2014).

Natural Language Toolkit – NLTK 3.6.2 documentation, https://www.nltk.org/

Stoilos, G., Stamou, G., Kollias, S.: A string metric for ontology alignment. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 3729 LNCS, 624–637 (2005). https://doi.org/10.1007/11574620_45

Word embedding demo, http://bionlp-www.utu.fi/wv_demo/

Lee, M., Kim, W., Park, S.: Searching and ranking method of relevant resources by user intention on the Semantic Web. Expert Syst. Appl. 39, 4111–4121 (2012). https://doi.org/10.1016/j.eswa.2011.09.127

No. 1 Position in Google Gets 33% of Search Traffic [Study], https://www.searchenginewatch.com/2013/06/20/no-1-position-in-google-gets-33-of-search-traffic-study

Cimiano, P., Völker, J.: Text2Onto A framework for ontology learning and data-driven change discovery. In: Lecture Notes in Computer Science. pp. 227–238. Springer Verlag (2005).

Published

2021-11-19

Issue

Section

Articles