End-to-end emergency response protocol for tunnel accidents augmentation with reinforcement learning

ur Rehman, Hafiz Muhammad Raza and Gul, M. Junaid and Younas, Rabbiya and Jhandir, Muhammad Zeeshan and Álvarez, Roberto Marcelo and Miró Vera, Yini Airet and Ashraf, Imran UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, roberto.alvarez@uneatlantico.es, yini.miro@uneatlantico.es, UNSPECIFIED (2026) End-to-end emergency response protocol for tunnel accidents augmentation with reinforcement learning. Scientific Reports. ISSN 2045-2322

Text
s41598-026-37191-w_reference.pdf
Available under License Creative Commons Attribution Non-commercial No Derivatives.
Download (3MB)

Official URL: http://doi.org/10.1038/s41598-026-37191-w

Abstract

Autonomous unmanned aerial vehicles (UAVs) offer cost-effective and flexible solutions for a wide range of real-world applications, particularly in hazardous and time-critical environments. Their ability to navigate autonomously, communicate rapidly, and avoid collisions makes UAVs well suited for emergency response scenarios. However, real-time path planning in dynamic and unpredictable environments remains a major challenge, especially in confined tunnel infrastructures where accidents may trigger fires, smoke propagation, debris, and rapid environmental changes. In such conditions, conventional preplanned or model-based navigation approaches often fail due to limited visibility, narrow passages, and the absence of reliable localization signals. To address these challenges, this work proposes an end-to-end emergency response framework for tunnel accidents based on Multi-Agent Reinforcement Learning (MARL). Each UAV operates as an independent learning agent using an Independent Q-Learning paradigm, enabling real-time decision-making under limited computational resources. To mitigate premature convergence and local optima during exploration, Grey Wolf Optimization (GWO) is integrated as a policy-guidance mechanism within the reinforcement learning (RL) framework. A customized reward function is designed to prioritize victim discovery, penalize unsafe behavior, and explicitly discourage redundant exploration among agents. The proposed approach is evaluated using a frontier-based exploration simulator under both single-agent and multi-agent settings with multiple goals. Extensive simulation results demonstrate that the proposed framework achieves faster goal discovery, improved map coverage, and reduced rescue time compared to state-of-the-art GWO-based exploration and random search algorithms. These results highlight the effectiveness of lightweight MARL-based coordination for autonomous UAV-assisted tunnel emergency response.

Item Type:	Article
Uncontrolled Keywords:	Robotic systems; drones; multi-agents system; path finding; reinforcement learning; tunnel hazards; unmanned aerial vehicles
Subjects:	Subjects > Engineering
Divisions:	Europe University of Atlantic > Research > Scientific Production Ibero-american International University > Research > Scientific Production Ibero-american International University > Research > Scientific Production Universidad Internacional do Cuanza > Research > Scientific Production University of La Romana > Research > Scientific Production
Depositing User:	Sr Bibliotecario
Date Deposited:	04 Feb 2026 09:00
Last Modified:	04 Feb 2026 09:00
URI:	http://repositorio.funiber.org/id/eprint/27154

Actions (login required)

View Item