콘텐츠 본문
논문 해외 국제전문학술지(SCI급) Unpacking Performance Variability in Deep Reinforcement Learning: The Role of Observation Space Divergence
- 학술지 구분 국제전문학술지(SCI급)
- 게재년월 2025-07
- 저자명 Sooyoung Jang, Ahyun Lee
- 학술지명 Applied Sciences
- 발행국가 해외
- 논문언어 외국어
- 전체저자수 2
- 연구분야 공학 > 컴퓨터학
논문 초록 (Abstract)
Deep Reinforcement Learning (DRL) algorithms often exhibit significant performance variability across different training runs, even with identical settings. This paper investigates the hypothesis that a key contributor to this variability is the divergence in the observation spaces explored by individual learning agents. We conducted an empirical study using Proximal Policy Optimization (PPO) agents trained on eight Atari environments. We analyzed the collected agent trajectories by qualitatively visualizing and quantitatively measuring the divergence in their explored observation spaces. Furthermore, we cross-evaluated the learned actor and value networks, measuring the average absolute TD-error, the RMSE of value estimates, and the KL divergence between policies to assess their functional similarity. We also conducted experiments where agents were trained from identical network initializations to isolate the source of this divergence. Our findings reveal a strong correlation: environments with low-performance variance (e.g., Freeway) showed high similarity in explored observation spaces and learned networks across agents. Conversely, environments with high-performance variability (e.g., Boxing, Qbert) demonstrated significant divergence in both explored states and network functionalities. This pattern persisted even when agents started with identical network weights. These results suggest that differences in experiential trajectories, driven by the stochasticity of agent–environment interactions, lead to specialized agent policies and value functions, thereby contributing substantially to the observed inconsistencies in DRL performance.

