An Intelligent Penetration Strategy for Power System Networks Using Reinforcement Learning

Manpo  Li; Ning  Yang; Xuerui  Yang; Xuezhu  Jin; Long  Yin; Jian  Xu

doi:10.13052/jcsm2245-1439.1458

2025, Articles

2025

An Intelligent Penetration Strategy for Power System Networks Using Reinforcement Learning

Articles

https://doi.org/10.13052/jcsm2245-1439.1458

Published 2025-12-12

Manpo Li⁺⁻
Ning Yang⁺⁻
Xuerui Yang⁺⁻
Xuezhu Jin⁺⁻
Long Yin⁺⁻
Jian Xu⁺⁻

Manpo Li

Northeast Branch of State Grid Corporation of China, Shenyang 110180, China

Ning Yang

Northeast Branch of State Grid Corporation of China, Shenyang 110180, China

Xuerui Yang

Northeast Branch of State Grid Corporation of China, Shenyang 110180, China

Xuezhu Jin

Northeast Branch of State Grid Corporation of China, Shenyang 110180, China

Long Yin

Software College, Northeastern University, Shenyang 110169, China

https://orcid.org/0000-0003-0552-3516

Jian Xu

Software College, Northeastern University, Shenyang 110169, China

PDF

HTML

Keywords

Penetration testing
Reinforcement learning
Cybersecurity
DQN algorithm

How to Cite

[1]

M. . Li, N. . Yang, X. . Yang, X. . Jin, L. . Yin, and J. . Xu, “An Intelligent Penetration Strategy for Power System Networks Using Reinforcement Learning”, JCSANDM, vol. 14, no. 05, pp. 1221–1244, Dec. 2025.

Abstract

Cybersecurity is vital for modern power systems, which are increasingly exposed to sophisticated cyber threats. Penetration testing is an effective method for identifying system vulnerabilities by simulating real-world attacks. However, traditional approaches depend heavily on expert knowledge and manual effort, resulting in high labor and time costs. To address this, we propose an autonomous penetration testing framework tailored for power system networks. The problem is modeled as a Markov Decision Process (MDP) and solved using an enhanced deep reinforcement learning algorithm. Specifically, we introduce SPIND-DQL, which integrates NoisyNet, Dueling Architecture, Prioritized Experience Replay (PER), Intrinsic Curiosity Module (ICM), and Soft Q-Learning to improve exploration efficiency and reduce trial-and-error during training. Experiments conducted in Microsoft’s CyberBattleSim, adapted to reflect power system network environments, show that SPIND-DQL achieves up to 40% faster convergence and compromises 25% more assets compared to baseline DQN variants and strong baselines like Rainbow DQN. Our ablation studies confirm the significant contribution of each component, particularly ICM and Soft Q-Learning, in discovering complex attack paths. This highlights its potential as a practical and intelligent tool for power system cybersecurity assessment.

https://doi.org/10.13052/jcsm2245-1439.1458

PDF

HTML

References

Enhanced-dueling deep q-network for trustworthy physical security of electric power substations. Energies, 18(12):3194, 2025.

Hossein Akherati, Jalil Beyramzad, Shadi Shahmari Khiyabani, Abouzar Shariatinezhad, and Esmaeil Eskandari. Finite-time stable model free sliding mode attitude controller/observer design for uncertain space systems based on time delay estimation. Advances in Space Research, 2025.

Youakim Badr. Enabling intrusion detection systems with dueling double deep q-learning. Digital Transformation and Society, 1(1), 2022.

Pengcheng Chen, Shichao Liu, Bo Chen, and Li Yu. Multi-agent reinforcement learning for decentralized resilient secondary control of energy storage systems against dos attacks. IEEE Transactions on Smart Grid, 13(3):1739–1750, 2022.

Shirin Ebadi, Zach Moolman, Eric Keller, and Tamara Lehman. Decoupling the device and identity in cellular networks with vsim. arXiv preprint arXiv:2505.15827, 2025.

Medhat Elsayed, Melike Erol-Kantarci, Burak Kantarci, Lei Wu, and Jie Li. Low-latency communications for community resilience microgrids: A reinforcement learning approach. IEEE Transactions on Smart Grid, 11(2):1091–1099, 2020.

G. S. R. Emil Selvan, T. Daniya, J. P. Ananth, and K. Suresh Kumar. Network intrusion detection and mitigation using hybrid optimization integrated deep q network. Cybernetics and Systems, 55(1):107–123, 2024.

Mozaffar Etezadifar, Houshang Karimi, Amir G. Aghdam, and Jean Mahseredjian. Resilient event detection algorithm for non-intrusive load monitoring under non-ideal conditions using reinforcement learning. IEEE Transactions on Industry Applications, 60(2):2085–2094, 2024.

D. Fährmann, N. Jorek, N. Damer, F. Kirchbuchner, and A. Kuijper. Double deep q-learning with prioritized experience replay for anomaly detection in smart environments. IEEE Access, 10:60836–60848, 2022.

Neshat Elhami Fard and Rastko R. Selmic. Data transmission resilience to cyber-attacks on heterogeneous multi-agent deep reinforcement learning systems. In 2022 17th International Conference on Control, Automation, Robotics and Vision (ICARCV), pages 758–764, 2022.

Youqi Guo, Lingfeng Wang, Zhaoxi Liu, and Yitong Shen. Reinforcement-learning-based dynamic defense strategy of multistage game against dynamic load altering attack. International Journal of Electrical Power & Energy Systems, 131:107113, 2021.

Niloofar Heidarikohol, Shuvalaxmi Dass, and Akbar Siami Namin. Evolutionary defense: Advancing moving target strategies with bio-inspired reinforcement learning to secure misconfigured software applications. arXiv preprint arXiv:2504.09465, 2025.

Yunhan Huang, Linan Huang, and Quanyan Zhu. Reinforcement learning for feedback-enabled cyber resilience. Annual Reviews in Control, 53:273–295, 2022.

Emadodin Jandaghi, Dalton L Stein, Adam Hoburg, Paolo Stegagno, Mingxi Zhou, and Chengzhi Yuan. Composite distributed learning and synchronization of nonlinear multi-agent systems with complete uncertain dynamics. In 2024 IEEE international conference on advanced intelligent mechatronics (AIM), pages 1367–1372. IEEE, 2024.

Sepideh Nikookar, Sohrab Namazi Nia, Senjuti Basu Roy, Sihem Amer-Yahia, and Behrooz Omidvar-Tehrani. Model reusability in reinforcement learning. The VLDB Journal, 34(4):41, 2025.

Abhijeet Sahu, Venkatesh Venkatraman, and Richard Macwan. Reinforcement learning environment for cyber-resilient power distribution system. IEEE Access, 11:127216–127228, 2023.

Vikrant Sharma and Mukesh Kumar. Rainbow dqn for intrusion detection: A unified deep reinforcement learning approach across benchmark datasets. International Journal of Applied Mathematics, 38(5s): 647–660, 2025.

S. Shen, C. Cai, Z. Li, Y. Shen, G. Wu, and S. Yu. Deep q-network-based heuristic intrusion detection against edge-based siot zero-day attacks. Applied Soft Computing, 150:111080, 2024.

Fanrong Wei, Zhiqiang Wan, and Haibo He. Cyber-attack recovery strategy for smart grid based on deep reinforcement learning. IEEE Transactions on Smart Grid, 11(3):2476–2486, 2020.

Lanting Zeng, Dawei Qiu, and Mingyang Sun. Resilience enhancement of multi-agent reinforcement learning-based demand response against adversarial attacks. Applied Energy, 324:119688, 2022.

Huifeng Zhang, Dong Yue, Chunxia Dou, and Gerhard P. Hancke. Resilient optimal defensive strategy of micro-grids system via distributed deep reinforcement learning approach against fdi attack. IEEE Transactions on Neural Networks and Learning Systems, 35(1): 598–608, 2024.

Huifeng Zhang, Dong Yue, Chunxia Dou, Xiangpeng Xie, Kang Li, and Gerhardus P. Hancke. Resilient optimal defensive strategy of tsk fuzzy-model-based microgrids’ system via a novel reinforcement learning approach. IEEE Transactions on Neural Networks and Learning Systems, 34(4):1921–1931, 2023.

Meng Zhang, Zhuorui Wu, Jun Yan, Rongxing Lu, and Xiaohong Guan. Attack-resilient optimal pmu placement via reinforcement learning guided tree search in smart grids. IEEE Transactions on Information Forensics and Security, 17:1919–1929, 2022.

Erfan Ziad, Zhuo Yang, Yan Lu, and Feng Ju. Knowledge constrained deep clustering for melt pool anomaly detection in laser powder bed fusion. In 2024 IEEE 20th International Conference on Automation Science and Engineering (CASE), pages 670–675. IEEE, 2024.