Lightweight Probabilistic RL for Web-app Compatible Large-scale OHT Path Optimization
DOI:
https://doi.org/10.13052/jwe1540-9589.2545Keywords:
Path Optimization, Reinforcement Learning, Proximal Policy Optimization, Web-App Compatible RoutingAbstract
As modern smart-factory environments increasingly require real-time remote operation and lightweight cloud-based control, routing intelligence for OHT systems must be fully web-app compatible, supporting scalable deployment without reliance on high-end local infrastructure. To address these demands and the limitations of static algorithms in large-scale OHT systems, this study proposes a multi-agent reinforcement learning model based on proximal policy optimization, incorporating a state space that accounts for chain blockage probability. The key metric, “movement success probability,” integrates preceding agent states to predictively assess chain-reaction congestion, enabling agents to proactively select stable detours. To enhance scalability in high-density environments, the model stabilizes learning through a lightweight policy initialization approach rather than requiring large-scale training from scratch. Moreover, the proposed decentralized structure minimizes central computational overhead, aligning naturally with web-app deployment and enabling real-time monitoring across distributed environments.
In a simulation with 1333 nodes and 100 OHTs, the proposed model achieved an average task completion distance of 166,809 mm, improving efficiency by 4.1% over the rule-based Floyd–Warshall method (173,940 mm). Notably, in worst-case scenarios where the rule-based method surged to 321,753 mm due to congestion, the AI model maintained 176,268 mm, achieving a 45.2% reduction and demonstrating superior operational stability.
Downloads
References
Hwang, Illhoe, and Young Jae Jang. “Q (λ) learning-based dynamic route guidance algorithm for overhead hoist transport systems in semiconductor fabs.” International Journal of Production Research 58.4 (2020): 1199–1221.
Watkins, Christopher JCH, and Peter Dayan. “Q-learning.” Machine learning 8.3 (1992): 279–292.
Bellman, Richard. “Dynamic programming and stochastic control processes.” Information and control 1.3 (1958): 228–239.
Kober, Jens, J. Andrew Bagnell, and Jan Peters. “Reinforcement learning in robotics: A survey.” The International Journal of Robotics Research 32.11 (2013): 1238–1274.
Mnih, Volodymyr, et al. “Playing atari with deep reinforcement learning.” arXiv preprint arXiv:1312.5602 (2013).
Liao, Haiguang, et al. “A deep reinforcement learning approach for global routing.” Journal of Mechanical Design 142.6 (2020): 061701.
Van Hasselt, Hado, Arthur Guez, and David Silver. “Deep reinforcement learning with double q-learning.” Proceedings of the AAAI conference on artificial intelligence. Vol. 30. No. 1. 2016.
Lillicrap, Timothy P., et al. “Continuous control with deep reinforcement learning.” arXiv preprint arXiv:1509.02971 (2015).
Shen, Zi-Zhen, Rui Yu, and Yang-Yang Chen. “Rules-PPO-QMIX: multi-agent reinforcement learning with mixed rules for large scene tasks.” 2021 China Automation Congress (CAC). IEEE, 2021.
Lowe, Ryan, et al. “Multi-agent actor-critic for mixed cooperative-competitive environments.” Advances in neural information processing systems 30 (2017).
Yang, Yaodong, et al. “Mean field multi-agent reinforcement learning.” International conference on machine learning. PMLR, 2018.
Schulman, John, et al. “Proximal policy optimization algorithms.” arXiv preprint arXiv:1707.06347 (2017).
Pan, Sinno Jialin, and Qiang Yang. “A survey on transfer learning.” IEEE Transactions on knowledge and data engineering 22.10 (2009): 1345–1359.

