Citation Count Prediction Using Abstracts
Keywords:
Citation count prediction, Document classification, Text analysis, Machine learningAbstract
Researchers are expected to find previous literature that is related to their research and potentially has a scientific impact from among a large number of publications. This paper addresses the problem of predicting the citation count of each research paper, that is, the number of citations from other papers to that paper. Previous literature related to the problem claims that the textual data of papers do not deeply affect the prediction compared with data about the authors and venues of publication. In contrast, the authors of this paper detected the citation counts of papers using only the paper abstracts. Additionally, they investigated the effect of technical terms used in the abstracts on the detection. They classified abstracts of papers with high and low citation counts and applied the classification to the abstracts modified by hiding the technical terms used in them. The results of their experiments indicate that the high and low of citation counts of research papers can be detected using their abstracts, and the effective features used in the prediction are related to the trend of research topics.
Downloads
References
Europe PMC: Europe PubMed Central. https://europepmc.org/. Accessed Feb. 5, 2018.
MeSH: Medical Subject Headings. https://www.nlm.nih.gov/ mesh/. Accessed Feb. 5, 2018.
PNAS: Proceedings of the National Academy of Sciences. http://www.pnas.org/. Accessed Feb. 5, 2018.
Takahiro Baba and Kensuke Baba. Citation count prediction using non-technical terms in abstracts. In Computational Science and Its Applications – ICCSA 2018, pages 366–375. Springer International Publishing, 2018.
Takahiro Baba, Kensuke Baba, and Daisuke Ikeda. Predicting author’s native language using abstracts of scholarly papers. In Foundations of Intelligent Systems, pages 448–453. Springer International Publishing, 2018.
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993–1022, 2003.
P L Callahan, S Mizutani, and R J Colonno. Molecular cloning and complete sequence determination of rna genome of human rhinovirus type 14. Proceedings of the National Academy of Sciences, 82:732–736, 1985.
J. Chen and C. Zhang. Predicting citation counts of papers. In 2015 IEEE 14th International Conference on Cognitive Informatics Cognitive Computing (ICCI*CC), pages 434–440, July 2015.
Y. Dong, R. A. Johnson, and N. V. Chawla. Can scientific impact be predicted? IEEE Transactions on Big Data, 2(1):18–30, March 2016.
E Garfield. The history and meaning of the journal impact factor. JAMA, 295(1):90–93, 2006.
J. E. Hirsch. An index to quantify an individual’s scientific research output. PNAS, 102(46):16569–16572, November 2005.
Cheng-Te Li, Yu-Jen Lin, Rui Yan, and Mi-Yen Yeh. Trend-based citation count prediction for research articles. In Tru Cao, Ee-Peng Lim, Zhi-Hua Zhou, Tu-Bao Ho, David Cheung, and Hiroshi Motoda, editors, Advances in Knowledge Discovery and Data Mining, pages 659–671, Cham, 2015. Springer International Publishing.
Christopher D Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to information retrieval. Cambridge University Press, 2008.
Rui Yan, Jie Tang, Xiaobing Liu, Dongdong Shan, and Xiaoming Li. Citation count prediction: Learning to estimate future citations for literature. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11, pages 1247–1252, New York, NY, USA, 2011. ACM.
Dani Yogatama, Michael Heilman, Brendan O’Connor, Chris Dyer, Bryan R. Routledge, and Noah A. Smith. Predicting a scientific community’s response to an article. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’11, pages 594–604. Association for Computational Linguistics, 2011.