Citation Count Prediction Using Abstracts


  • Takahiro Baba Kyushu University, 819-0395, Fukuoka, Japan
  • Kensuke Baba Fujitsu Laboratories, Kawasaki, 211-8588, Japan
  • Daisuke Ikeda Kyushu University, 819-0395, Fukuoka, Japan


Citation count prediction, Document classification, Text analysis, Machine learning


Researchers are expected to find previous literature that is related to their research and potentially has a scientific impact from among a large number of publications. This paper addresses the problem of predicting the citation count of each research paper, that is, the number of citations from other papers to that paper. Previous literature related to the problem claims that the textual data of papers do not deeply affect the prediction compared with data about the authors and venues of publication. In contrast, the authors of this paper detected the citation counts of papers using only the paper abstracts. Additionally, they investigated the effect of technical terms used in the abstracts on the detection. They classified abstracts of papers with high and low citation counts and applied the classification to the abstracts modified by hiding the technical terms used in them. The results of their experiments indicate that the high and low of citation counts of research papers can be detected using their abstracts, and the effective features used in the prediction are related to the trend of research topics.


Download data is not yet available.

Author Biographies

Takahiro Baba, Kyushu University, 819-0395, Fukuoka, Japan

Takahiro Baba received the BSc and MSc degrees from Kyushu University in 2004 and 2011. From 2011 to 2016 he was a employee in the Lafla Inc. Currently, he is a doctoral student in Kyushu University.

Kensuke Baba, Fujitsu Laboratories, Kawasaki, 211-8588, Japan

Kensuke Baba received the BSc, MSc, and DSc degrees from Kyushu University in 1996, 1998, and 2002. From 2002 to 2003 he was a Research Fellow and from 2003 to 2009 an Assistant Professor in the Faculty of Information Science and Electrical Engineering, Kyushu University. From 2009 to 2015 he was an Associate Professor in the library of Kyushu University. Currently, he is a Research Fellow in the Artificial Intelligence Laboratory, Fujitsu Laboratories. His research interests include data mining, natural language processing, and machine learning. Dr. Baba is a member of the IEEE and IPSJ.

Daisuke Ikeda, Kyushu University, 819-0395, Fukuoka, Japan

Daisuke Ikeda received his BSc, MSc, and DSc degree in science from Kyushu University in 1994, 1996, and 2004, respectively. He is currently an Associate Professor in the Department of Informatics, Kyushu University. Formerly, he worked at Computer Center, Kyushu University, and Kyushu University Library. His research interests include data analysis, such as data mining and machine learning, and data infrastructure, such as database and information retrieval.


Europe PMC: Europe PubMed Central. Accessed Feb. 5, 2018.

MeSH: Medical Subject Headings. mesh/. Accessed Feb. 5, 2018.

PNAS: Proceedings of the National Academy of Sciences. Accessed Feb. 5, 2018.

Takahiro Baba and Kensuke Baba. Citation count prediction using non-technical terms in abstracts. In Computational Science and Its Applications – ICCSA 2018, pages 366–375. Springer International Publishing, 2018.

Takahiro Baba, Kensuke Baba, and Daisuke Ikeda. Predicting author’s native language using abstracts of scholarly papers. In Foundations of Intelligent Systems, pages 448–453. Springer International Publishing, 2018.

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993–1022, 2003.

P L Callahan, S Mizutani, and R J Colonno. Molecular cloning and complete sequence determination of rna genome of human rhinovirus type 14. Proceedings of the National Academy of Sciences, 82:732–736, 1985.

J. Chen and C. Zhang. Predicting citation counts of papers. In 2015 IEEE 14th International Conference on Cognitive Informatics Cognitive Computing (ICCI*CC), pages 434–440, July 2015.

Y. Dong, R. A. Johnson, and N. V. Chawla. Can scientific impact be predicted? IEEE Transactions on Big Data, 2(1):18–30, March 2016.

E Garfield. The history and meaning of the journal impact factor. JAMA, 295(1):90–93, 2006.

J. E. Hirsch. An index to quantify an individual’s scientific research output. PNAS, 102(46):16569–16572, November 2005.

Cheng-Te Li, Yu-Jen Lin, Rui Yan, and Mi-Yen Yeh. Trend-based citation count prediction for research articles. In Tru Cao, Ee-Peng Lim, Zhi-Hua Zhou, Tu-Bao Ho, David Cheung, and Hiroshi Motoda, editors, Advances in Knowledge Discovery and Data Mining, pages 659–671, Cham, 2015. Springer International Publishing.

Christopher D Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to information retrieval. Cambridge University Press, 2008.

Rui Yan, Jie Tang, Xiaobing Liu, Dongdong Shan, and Xiaoming Li. Citation count prediction: Learning to estimate future citations for literature. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11, pages 1247–1252, New York, NY, USA, 2011. ACM.

Dani Yogatama, Michael Heilman, Brendan O’Connor, Chris Dyer, Bryan R. Routledge, and Noah A. Smith. Predicting a scientific community’s response to an article. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’11, pages 594–604. Association for Computational Linguistics, 2011.