Publication: Probabilistic policy reuse for safe reinforcement learning
dc.affiliation.dpto | UC3M. Departamento de Informática | es |
dc.affiliation.grupoinv | UC3M. Grupo de Investigación: Planificación y Aprendizaje | es |
dc.contributor.author | García Polo, Francisco Javier | |
dc.contributor.author | Fernández Rebollo, Fernando | |
dc.contributor.funder | Ministerio de Economía y Competitividad (España) | es |
dc.contributor.funder | Comunidad de Madrid | es |
dc.date.accessioned | 2019-12-13T14:57:10Z | |
dc.date.available | 2019-12-13T14:57:10Z | |
dc.date.issued | 2019-03-28 | |
dc.description.abstract | This work introducesPolicy Reuse for Safe Reinforcement Learning, an algorithm that combines ProbabilisticPolicy Reuse and teacher advice for safe exploration in dangerous and continuous state and action reinforce-ment learning problems in which the dynamic behavior is reasonably smooth and the space is Euclidean. Thealgorithm uses a continuously increasing monotonic risk function that allows for the identification of theprobability to end up in failure from a given state. Such a risk function is defined in terms of how far such astate is from the state space known by the learning agent. Probabilistic Policy Reuse is used to safely balancethe exploitation of actual learned knowledge, the exploration of new actions, and the request of teacher advicein parts of the state space considered dangerous. Specifically, thepi-reuse exploration strategy is used. Usingexperiments in the helicopter hover task and a business management problem, we show that thepi-reuseexploration strategy can be used to completely avoid the visit to undesirable situations while maintainingthe performance (in terms of the classical long-term accumulated reward) of the final policy achieved. | es |
dc.description.sponsorship | This paper has been partially supported by the Spanish Ministerio de Economía y Competitividad TIN2015-65686-C5-1-R and the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement No. 730086 (ERGO). Javier García is partially supported by the Comunidad de Madrid (Spain) funds under the project 2016-T2/TIC-1712. | es |
dc.identifier.bibliographicCitation | García, J. and Fernández, F. Probabilistic Policy Reuse for Safe Reinforcement Learning. ACM Transactions on Autonomous and Adaptive Systems, 13(3), (2019) | es |
dc.identifier.doi | https://doi.org/10.1145/3310090 | |
dc.identifier.issn | 1556-4665 | |
dc.identifier.publicationissue | 3 | |
dc.identifier.publicationtitle | ACM Transactions on Autonomous and Adaptive Systems | en |
dc.identifier.publicationvolume | 13 | |
dc.identifier.uri | https://hdl.handle.net/10016/29354 | |
dc.identifier.uxxi | AR/0000024064 | |
dc.language.iso | eng | es |
dc.publisher | ACM | es |
dc.relation.projectID | info:eu-repo/grantAgreement/EC/H2020/730086 | es |
dc.relation.projectID | Gobierno de España. TIN2015-65686-C5-1-R | es |
dc.relation.projectID | Comunidad de Madrid. 2016-T2/TIC-1712. | es |
dc.rights | © ACM, 2019 | es |
dc.rights.accessRights | open access | es |
dc.subject.eciencia | Informática | es |
dc.subject.other | Reinforcement learning | en |
dc.subject.other | Case-based reasoning | en |
dc.subject.other | Software agents | en |
dc.title | Probabilistic policy reuse for safe reinforcement learning | en |
dc.type | research article | * |
dc.type.hasVersion | AM | * |
dspace.entity.type | Publication |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- probabilistic_ACM_2019_ps.pdf
- Size:
- 693.82 KB
- Format:
- Adobe Portable Document Format
- Description: