Lightweight Probabilistic RL for Web-app Compatible Large-scale OHT Path Optimization

Authors

  • OkHwan Bae Division of Computer Engineering, Hoseo University, Republic of Korea
  • Chung-Pyo Hong Division of Computer Engineering, Hoseo University, Republic of Korea

DOI:

https://doi.org/10.13052/jwe1540-9589.2545

Keywords:

Path Optimization, Reinforcement Learning, Proximal Policy Optimization, Web-App Compatible Routing

Abstract

As modern smart-factory environments increasingly require real-time remote operation and lightweight cloud-based control, routing intelligence for OHT systems must be fully web-app compatible, supporting scalable deployment without reliance on high-end local infrastructure. To address these demands and the limitations of static algorithms in large-scale OHT systems, this study proposes a multi-agent reinforcement learning model based on proximal policy optimization, incorporating a state space that accounts for chain blockage probability. The key metric, “movement success probability,” integrates preceding agent states to predictively assess chain-reaction congestion, enabling agents to proactively select stable detours. To enhance scalability in high-density environments, the model stabilizes learning through a lightweight policy initialization approach rather than requiring large-scale training from scratch. Moreover, the proposed decentralized structure minimizes central computational overhead, aligning naturally with web-app deployment and enabling real-time monitoring across distributed environments.

In a simulation with 1333 nodes and 100 OHTs, the proposed model achieved an average task completion distance of 166,809 mm, improving efficiency by 4.1% over the rule-based Floyd–Warshall method (173,940 mm). Notably, in worst-case scenarios where the rule-based method surged to 321,753 mm due to congestion, the AI model maintained 176,268 mm, achieving a 45.2% reduction and demonstrating superior operational stability.

Downloads

Download data is not yet available.

Author Biographies

OkHwan Bae , Division of Computer Engineering, Hoseo University, Republic of Korea

OkHwan Bae received his master’s degree in computer engineering from Hoseo University in 2025. His research interests include computer vision, deep learning, and reinforcement learning.

Chung-Pyo Hong, Division of Computer Engineering, Hoseo University, Republic of Korea

Chung-Pyo Hong received his B.Sc. and M.Sc. degrees in computer science from Yonsei University, Seoul, Korea, in 2004 and 2006, respectively. In 2012, he received his Ph.D. degree in computer science from Yonsei University, Seoul, Korea. He is currently an associate professor of computer engineering at Hoseo University, Asan, Korea. His research interests include machine learning, explainable AI, and data science.

References

Hwang, Illhoe, and Young Jae Jang. “Q (λ) learning-based dynamic route guidance algorithm for overhead hoist transport systems in semiconductor fabs.” International Journal of Production Research 58.4 (2020): 1199–1221.

Watkins, Christopher JCH, and Peter Dayan. “Q-learning.” Machine learning 8.3 (1992): 279–292.

Bellman, Richard. “Dynamic programming and stochastic control processes.” Information and control 1.3 (1958): 228–239.

Kober, Jens, J. Andrew Bagnell, and Jan Peters. “Reinforcement learning in robotics: A survey.” The International Journal of Robotics Research 32.11 (2013): 1238–1274.

Mnih, Volodymyr, et al. “Playing atari with deep reinforcement learning.” arXiv preprint arXiv:1312.5602 (2013).

Liao, Haiguang, et al. “A deep reinforcement learning approach for global routing.” Journal of Mechanical Design 142.6 (2020): 061701.

Van Hasselt, Hado, Arthur Guez, and David Silver. “Deep reinforcement learning with double q-learning.” Proceedings of the AAAI conference on artificial intelligence. Vol. 30. No. 1. 2016.

Lillicrap, Timothy P., et al. “Continuous control with deep reinforcement learning.” arXiv preprint arXiv:1509.02971 (2015).

Shen, Zi-Zhen, Rui Yu, and Yang-Yang Chen. “Rules-PPO-QMIX: multi-agent reinforcement learning with mixed rules for large scene tasks.” 2021 China Automation Congress (CAC). IEEE, 2021.

Lowe, Ryan, et al. “Multi-agent actor-critic for mixed cooperative-competitive environments.” Advances in neural information processing systems 30 (2017).

Yang, Yaodong, et al. “Mean field multi-agent reinforcement learning.” International conference on machine learning. PMLR, 2018.

Schulman, John, et al. “Proximal policy optimization algorithms.” arXiv preprint arXiv:1707.06347 (2017).

Pan, Sinno Jialin, and Qiang Yang. “A survey on transfer learning.” IEEE Transactions on knowledge and data engineering 22.10 (2009): 1345–1359.

Downloads

Published

2026-05-24

How to Cite

Bae , O. ., & Hong, C.-P. . (2026). Lightweight Probabilistic RL for Web-app Compatible Large-scale OHT Path Optimization. Journal of Web Engineering, 25(04), 583–598. https://doi.org/10.13052/jwe1540-9589.2545

Issue

Section

ECTI