Pre-trained Model-based Software Defect Prediction for Edge-cloud Systems

Authors

  • Sunjae Kwon Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
  • Sungu Lee Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
  • Duksan Ryu Jeonbuk National University, Jeonju, Republic of Korea
  • Jongmoon Baik Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea

DOI:

https://doi.org/10.13052/jwe1540-9589.2223

Keywords:

Just-in-time defect prediction, pre-trained model, edge-cloud system

Abstract

Edge-cloud computing is a distributed computing infrastructure that brings computation and data storage with low latency closer to clients. As interest in edge-cloud systems grows, research on testing the systems has also been actively studied. However, as with traditional systems, the amount of resources for testing is always limited. Thus, we suggest a function-level just-in-time (JIT) software defect prediction (SDP) model based on a pre-trained model to address the limitation by prioritizing the limited testing resources for the defect-prone functions. The pre-trained model is a transformer-based deep learning model trained on a large corpus of code snippets, and the fine-tuned pre-trained model can provide the defect proneness for the changed functions at a commit level. We evaluate the performance of the three popular pre-trained models (i.e., CodeBERT, GraphCodeBERT, UniXCoder) on edge-cloud systems in within-project and cross-project environments. To the best of our knowledge, it is the first attempt to analyse the performance of the three pre-trained model-based SDP models for edge-cloud systems. As a result, we can confirm that UniXCoder showed the best performance among the three in the WPDP environment. However, we also confirm that additional research is necessary to apply the SDP models to the CPDP environment.

Downloads

Download data is not yet available.

Author Biographies

Sunjae Kwon, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea

Sunjae Kwon received the bachelor’s degree in electric engineering from Korea Military Academy in 2009, the master’s degree in computer engineering from Maharishi Markandeshwar University in 2015. He is a doctoral student in software engineering from KAIST. His research areas include software analytics based on AI, software defect prediction, mining software repositories, and software reliability engineering.

Sungu Lee, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea

Sungu Lee received the bachelor’s degree in mathematics from KAIST in 2021, the master’s degree in software engineering from KAIST in 2022. He is a doctoral student in software engineering from KAIST. His research areas include software analytics based on AI, software defect prediction, mining software repositories, and software reliability engineering.

Duksan Ryu, Jeonbuk National University, Jeonju, Republic of Korea

Duksan Ryu earned a bachelor’s degree in computer science from Hanyang University in 1999 and a Master’s dual degree in software engineering from KAIST and Carnegie Mellon University in 2012. He received his Ph.D. degree in school of computing from KAIST in 2016. His research areas include software analytics based on AI, software defect prediction, mining software repositories, and software reliability engineering. He is currently an associate professor in software engineering department at Jeonbuk National University.

Jongmoon Baik, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea

Jongmoon Baik received his B.S. degree in computer science and statistics from Chosun University in 1993. He received his M.S. degree and Ph.D. degree in computer science from University of Southern California in 1996 and 2000 respectively. He worked as a principal research scientist at Software and Systems Engineering Research Laboratory, Motorola Labs, where he was responsible for leading many software quality improvement initiatives. His research activity and interest are focused on software six sigma, software reliability & safety, and software process improvement. Currently, he is an associate professor in school of computing at Korea Advanced Institute of Science and Technology (KAIST). He is a member of the IEEE.

References

E. N. Akimova, et al., “PyTraceBugs: A large Python code dataset for supervised machine learning in software defect prediction,” in 2021 28th Asia-Pacific Software Engineering Conference (APSEC), 2021, pp. 141–151.

M. Bakaev, et al. (eds.) “ICWE 2021 International Workshops, BECS and Invited Papers, Biarritz. France, 2021” in Revised Selected Papers. Springer Nature, 2022.

M. V. R Blondet, et al., “A wearable real-time BCI system based on mobile cloud computing,” in 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER), 2013, pp. 739–742.

E. H. Butterfield, “Fog computing with Go: A comparative study,” CMC Senior Thesis, Claremont College, 2016.

R. Buyya and N. S. Satish (eds.) Fog and Edge Computing: Principles And Paradigms, John Wiley & Sons, 2019.

J. Deng, L. Lu, Q. Shaojian, “Software defect prediction via LSTM,” IET Software, vol. 14, no. 4, pp. 443–450, 2020.

Z. Feng, et al., “Codebert: A pre-trained model for programming and natural languages,” arXiv preprint arXiv:2002.08155, 2020.

D. Guo, et al., “UniXcoder: Unified cross-modal pre-training for code representation,” arXiv preprint arXiv:2203.03850, 2022.

D. Guo, et al., “Graphcodebert: Pre-training code representations with data flow,” arXiv preprint arXiv:2009.08366, 2020.

S. Herbold, A. Trautsch, J. Grabowski, “A comparative study to benchmark cross-project defect prediction approaches,” in Proceedings of the 40th International Conference on Software Engineering, 2018, pp. 1063–1063.

H. Husain, et al., “Codesearchnet challenge: Evaluating the state of semantic code search,” arXiv preprint arXiv:1909.09436, 2019.

C. Khanan, et al., “JITBot: an explainable just-in-time defect prediction bot,” in Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, 2020, pp. 1336–1339.

S. Kwon, et al., “CodeBERT based software defect prediction for edge-cloud systems,” in 2nd International Workshop on Big Data Driven Edge Cloud Services (BECS 2022), International Society for Web Engineering, 2022.

J. Li, et al., “Software defect prediction via convolutional neural network,” in 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS), IEEE, 2017, pp. 318–328.

Z. Li, et al., “CodeReviewer: Pre-training for automating code review activities,” arXiv preprint arXiv:2203.09095, 2022.

E. Mashhadi and H. Hemmati, “Applying codebert for automated program repair of java simple bugs,” in 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), IEEE, 2021, pp. 505–509.

F. F. S. B. De Matos, P. A. L. Rego, F. A. M. Trinta, “An empirical study about the adoption of multi-language technique in computation offloading in a mobile cloud computing scenario,” in 11th International Conference on Cloud Computing and Services Science, 2021, pp. 207–214.

C. Pan, M. Lu, B. Xu, “An empirical study on software defect prediction using codebert model,” Applied Sciences, vol. 11, no. 11, p. 4793, 2021.

S. K Pandey, R. B. Mishra, A. K. Tripathi, “Machine learning based methods for software fault prediction: A survey,” Expert Systems with Applications, vol. 172, p. 114595, 2021.

K. Shi, et al., “PathPair2Vec: An AST path pair-based code representation method for defect prediction,” Journal of Computer Languages, vol. 59, p. 100979, 2020.

Y. Shin, et al., “Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities,” IEEE Transactions on Software Engineering, vol. 37, no. 6, pp. 772–787, 2010.

R. S. Wahono, “A systematic literature review of software defect prediction,” Journal of Software Engineering, vol. 1.1, pp. 1–16, 2015.

J. Xu, et al., “ACGDP: An augmented code graph-based system for software defect prediction,” IEEE Transactions on Reliability, vol. 71, no. 2, 2022.

J. Xu, F. Wang, J. Ai, “Defect prediction with semantics and context features of codes based on graph representation learning,” IEEE Transactions on Reliability, vol. 70, no. 2, pp. 613–625, 2020.

J. Xu, et al., !A GitHub-based data collection method for software defect prediction,” in 2019 6th International Conference on Dependable Systems and Their Applications (DSA), IEEE, 2020, pp. 100–108.

X. Yang, et al., “Deep learning for just-in-time defect prediction,” in 2015 IEEE International Conference on Software Quality, Reliability and Security, IEEE, 2015, pp. 17–26.

F. Zhang, et al., “Improving stack overflow question title generation with copying enhanced CodeBERT model and bi-modal information,” Information and Software Technology, vol. 148, pp. 106922, 2022.

H. Zhang and S. C. Cheung, “A cost-effectiveness criterion for applying software defect prediction models,” in Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, 2013, pp. 643–646.

X. Zhou, D. Han, D. Lo, “Assessing generalizability of CodeBERT,” in 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME), IEEE, 2021, pp. 425–436.

Y. Zhou, et al., “How far we have progressed in the journey? An examination of cross-project defect prediction,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 27, no. 1, pp. 1–51, 2018.

Downloads

Published

2023-06-21

How to Cite

Kwon, S. ., Lee, S. ., Ryu, D. ., & Baik, J. . (2023). Pre-trained Model-based Software Defect Prediction for Edge-cloud Systems. Journal of Web Engineering, 22(02), 255–278. https://doi.org/10.13052/jwe1540-9589.2223

Issue

Section

BECS 2022