RT Journal Article T1 Probabilistic policy reuse for safe reinforcement learning A1 García Polo, Francisco Javier A1 Fernández Rebollo, Fernando AB This work introducesPolicy Reuse for Safe Reinforcement Learning, an algorithm that combines ProbabilisticPolicy Reuse and teacher advice for safe exploration in dangerous and continuous state and action reinforce-ment learning problems in which the dynamic behavior is reasonably smooth and the space is Euclidean. Thealgorithm uses a continuously increasing monotonic risk function that allows for the identification of theprobability to end up in failure from a given state. Such a risk function is defined in terms of how far such astate is from the state space known by the learning agent. Probabilistic Policy Reuse is used to safely balancethe exploitation of actual learned knowledge, the exploration of new actions, and the request of teacher advicein parts of the state space considered dangerous. Specifically, thepi-reuse exploration strategy is used. Usingexperiments in the helicopter hover task and a business management problem, we show that thepi-reuseexploration strategy can be used to completely avoid the visit to undesirable situations while maintainingthe performance (in terms of the classical long-term accumulated reward) of the final policy achieved. PB ACM SN 1556-4665 YR 2019 FD 2019-03-28 LK https://hdl.handle.net/10016/29354 UL https://hdl.handle.net/10016/29354 LA eng NO This paper has been partially supported by the Spanish Ministerio de Economía y Competitividad TIN2015-65686-C5-1-R and the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement No. 730086 (ERGO). Javier García is partially supported by the Comunidad de Madrid (Spain) funds under the project 2016-T2/TIC-1712. DS e-Archivo RD 17 jul. 2024