SPARQL Generation with an NMT-based Approach


  • Jia-Huei Lin National Chung Hsing University, Taichung, Taiwan (R.O.C.)
  • Eric Jui-Lin Lu National Chung Hsing University, Taichung, Taiwan (R.O.C.)



SPARQL Generation, Neural Machine Translation, Question Answering, Transformer


SPARQL is a powerful query language which has been widely used in various natural language question answering (QA) systems. As the advances of deep neural networks, Neural Machine Translation (NMT) models are employed to directly translate natural language questions to SPARQL queries in recent years. In this paper, we propose an NMT-based approach with Transformer model to generate SPARQL queries. Transformer model is chosen due to its relatively high efficiency and effectiveness. We design a format to encode a SPARQL query into a simple sequence with only RDF triples reserved. The main purpose of this step is to shorten the sequences and reduce the complexity of the target language. Moreover, we employ entity type tags to further resolve mistranslated problems. The proposed approach is evaluated against three open-domain question answering datasets (QALD-7, QALD-8, and LC-QuAD) on BLEU score and accuracy, and obtains outstanding results (83.49%, 90.13%, and 76.32% on BLEU score, respectively) which considerably outperform all known studies.


Download data is not yet available.

Author Biographies

Jia-Huei Lin, National Chung Hsing University, Taichung, Taiwan (R.O.C.)

Jia-Huei Lin received the bachelor’s degree in computer science and information engineering from National Taiwan Normal University in 2019. She is currently studying for her master’s degree majoring in Management Information in National Chung Hsing University. She has engaged in researches about natural language processing, question answering system, and neural machine translation.

Eric Jui-Lin Lu, National Chung Hsing University, Taichung, Taiwan (R.O.C.)

Eric Jui-Lin Lu received the B.A. degree from the National Chiao-Tung University, Tsin-Chu in 1982. Later on, he received his MSBA degree from San Francisco State University, San Francisco in 1990. He received his Ph.D. degree in computer science from Missouri University of Science and Technology (formerly University of Missouri-Rolla), Missouri in 1996. He is currently a professor with the Department of Management Information Systems, National Chung Hsing University. His research interests include machine learning, natural language question answering, and semantic web.


Prud’hommeaux, E., and A. Seaborne. 2008. SPARQL Query Language for RDF. Available:

Yin, X., D. Gromann, and S. Rudolph. 2021. Neural Machine Translating from Natural Language to SPARQL. Future Generation Computer Systems. pp. 510–519.

Diomedi, D., and A. Hogan. 2021. Question Answering over Knowledge Graphs with Neural Machine Translation and Entity Linking. arXiv:2107.02865 [cs.AI].

Bizer, C., J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. 2009. DBpedia - A crystallization point for the Web of Data. Web semantics. 7(3): pp. 154–165.

Sutskever, I., O. Vinyals, and Q. Le. 2014. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems. Montreal, Canada.

Soru, T., E. Marx, D. Moussallem, G. Publio, A. Valdestilhas, D. Esteves, and C. B. Neto. 2017. SPARQL as a Foreign Language. SEMANTiCS CEUR Workshop Proceedings 2044. Amsterdam, The Netherlands.

Soru, T., E. Marx, A. Valdestilhas, D. Esteves, D. Moussallem, and G. Publio. 2018. Neural Machine Translation for Query Construction and Composition. ICML workshop on Neural Abstract Machines & Program Induction v2. Stockholm, Sweden.

Hartmann, A.-K., T. Soru, and E. Marx. 2018. Generating a Large Dataset for Neural Question Answering over the DBpedia Knowledge Base. Workshop on Linked Data Management and WEBBR. Vienna, Austria.

Trivedi, P., G. Maheshwari, M. Dubey, and J. Lehmann. 2017. LC-QuAD: A Corpus for Complex Question Answering over Knowledge Graphs. Lecture Notes in Computer Science. Springer, Cham. pp. 210–218.

Diefenbach, D., A. Both, K. Singh, and P. Maret. 2020. Towards a Question Answering System over the Semantic Web. Semantic Web. 11(3): pp. 421–439.

Chen, Y.-H, Lu, E. J.-L., and Ou, T.-A., 2021. Intelligent SPARQL Query Generation for Natural Language Processing Systems. IEEE Access. 9: pp. 158638–158650.

Lu, E. J.-L., and C.-H. Cheng. 2020. Multiple Classifiers for Entity Type Recognition. Conference on Smart Computing. Penghu, Taiwan.

Kuo, C.-Y., and E. J.-L. Lu. 2021. A BiLSTM-CRF Entity Type Tagger for Question. IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology. Indonesia.

Xu, K., S. Zhang, Y. Feng, and D. Zhao. 2014. Answering natural language questions via phrasal semantic parsing. Natural Language Processing and Chinese Computing. Berlin, Heidelberg.

Hu, S., L. Zou, Y. J. Xu, H. Wang, and D. Zhao. 2018. Answering Natural Language Questions by Subgraph Matching over Knowledge Graphs. IEEE Transactions on Knowledge and Data Engineering. 30(5): pp. 824–837.

Gehring, J., M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin. 2016. Convolutional Sequence to Sequence Learning. 34th International Conference on Machine Learning. Sydney, Australia.

Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. 2017. Attention Is All You Need. 31st Conference on Neural Information Processing Systems. Long Beach, CA, USA.

Lin, J.-H., and E. J.-L. Lu. 2021. An NMT-based Approach to Translate Natural Language Questions to SPARQL Queries. International Conference on IT Convergence and Security. Virtual Conference.

Liang, S., K. Stockinger, T. Mendes de Farias, M. Anisimova, and M. Gil. 2021. Querying knowledge graphs in natural language. Journal of Big Data. 8(1): pp. 1–23.

Firat, O., K. Cho, B. Sankaran, F. T. Yarman Vural, and Y. Bengio. 2017. Multi-way, multilingual neural machine translation. Computer speech & language. 9: pp. 236–252.

Wu, Y., M. Schuster, Z. Chen, Q. V. Le, and M. Norouzi. 2016. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv:1609.08144 [cs.CL].

Usbeck, R., A.-C. N. Ngomo, B. Haarmann, A. Krithara, M. Röder, and G. Napolitano. 2017. 7th Open Challenge on Question Answering over Linked Data (QALD-7). SemWebEval 2017: Semantic Web Challenges. Springer, Cham.

Usbeck, R., A.-C. N. Ngomo, F. Conrads, M. Röder, and G. Napolitano. 2018. 8th challenge on question answering over linked data (QALD-8). Language. 7(1): pp. 51–57.

Papineni, K., S. Roukos, T. Ward, and W.-J. Zhu. 2002. BLEU: a Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, Pennsylvania, USA.

Bojar, O., C. Buck, C. Federmann, B. Haddow, P. Koehn, J. Leveling, C. Monz, P. Pecina, M. Post, H. Saint-Amand, R. Soricut, L. Specia, A. Tamchyna. 2014. Findings of the 2014 Workshop on Statistical Machine Translation. Proceedings of the Ninth Workshop on Statistical Machine Translation. Maryland, USA.

Wen, L., X. Li, and L. Gao. 2021. A New Reinforcement Learning Based Learning Rate Scheduler for Convolutional Neural Network in Fault Classification. IEEE Transactions on Industrial Electronics. 68(12): pp. 12890–12900.






SPECIAL ISSUE: Intelligent Edge Computing