A Semantic Similarity Measure for Scholarly Document Based on the Study of n-gram


  • Yannick-Ulrich Tchantchou Samen Department of Mathematic and Computer Science, Faculty of Science, University of Maroua, P.O Box: 814 Maroua, Cameroon




Semantic Similarity, n-gram, Natural Language Processing, Scholarly Document, Similarity Measure


The performance of information retrieval systems is closely related to the ability of similarity measures to accurately determine the similarity value between documents or between a query and a document. In this paper, the issue of similarity measures in the context of scholarly documents is addressed. A semantic similarity measure is proposed. This similarity measure is able to exploit the metadata contained in the scientific articles, as well as the important n-grams identified in them. To evaluate the accuracy of our similarity measure, a dataset of articles is built as well as their similarity values manually estimated by human experts. Experiments performed on this dataset using Pearson correlation show that the similarity values obtained using the proposed measure are very close to those estimated by human experts.


Author Biography

Yannick-Ulrich Tchantchou Samen, Department of Mathematic and Computer Science, Faculty of Science, University of Maroua, P.O Box: 814 Maroua, Cameroon

Yannick-Ulrich Tchantchou Samen received a BSc of pure Mathematics, a MSc of error correcting code from the Dept. of Mathematics, Faculty of Science, at the University of Yaounde 1, Cameroon, in 2011, and 2013 respectively. He received a PhD of Semantic Web at the Institute of Mathematics and Physical Sciences, University of Abomey-calavi, Benin in 2017. He has been with the Laboratory of Research in Computer science and Applications (LRSIA) since 2017 as a Researcher. Since 2021, he is a Lecturer of Computer Science from the Dept. of Mathematics and Computer Science at the University of Maroua. His current research areas include Semantic Web, Information Filtering, Natural Language Processing, and Web mining.


