Publication:
Probabilistic policy reuse for safe reinforcement learning

dc.affiliation.dptoUC3M. Departamento de Informáticaes
dc.affiliation.grupoinvUC3M. Grupo de Investigación: Planificación y Aprendizajees
dc.contributor.authorGarcía Polo, Francisco Javier
dc.contributor.authorFernández Rebollo, Fernando
dc.contributor.funderMinisterio de Economía y Competitividad (España)es
dc.contributor.funderComunidad de Madrides
dc.date.accessioned2019-12-13T14:57:10Z
dc.date.available2019-12-13T14:57:10Z
dc.date.issued2019-03-28
dc.description.abstractThis work introducesPolicy Reuse for Safe Reinforcement Learning, an algorithm that combines ProbabilisticPolicy Reuse and teacher advice for safe exploration in dangerous and continuous state and action reinforce-ment learning problems in which the dynamic behavior is reasonably smooth and the space is Euclidean. Thealgorithm uses a continuously increasing monotonic risk function that allows for the identification of theprobability to end up in failure from a given state. Such a risk function is defined in terms of how far such astate is from the state space known by the learning agent. Probabilistic Policy Reuse is used to safely balancethe exploitation of actual learned knowledge, the exploration of new actions, and the request of teacher advicein parts of the state space considered dangerous. Specifically, thepi-reuse exploration strategy is used. Usingexperiments in the helicopter hover task and a business management problem, we show that thepi-reuseexploration strategy can be used to completely avoid the visit to undesirable situations while maintainingthe performance (in terms of the classical long-term accumulated reward) of the final policy achieved.es
dc.description.sponsorshipThis paper has been partially supported by the Spanish Ministerio de Economía y Competitividad TIN2015-65686-C5-1-R and the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement No. 730086 (ERGO). Javier García is partially supported by the Comunidad de Madrid (Spain) funds under the project 2016-T2/TIC-1712.es
dc.identifier.bibliographicCitationGarcía, J. and Fernández, F. Probabilistic Policy Reuse for Safe Reinforcement Learning. ACM Transactions on Autonomous and Adaptive Systems, 13(3), (2019)es
dc.identifier.doihttps://doi.org/10.1145/3310090
dc.identifier.issn1556-4665
dc.identifier.publicationissue3
dc.identifier.publicationtitleACM Transactions on Autonomous and Adaptive Systemsen
dc.identifier.publicationvolume13
dc.identifier.urihttps://hdl.handle.net/10016/29354
dc.identifier.uxxiAR/0000024064
dc.language.isoenges
dc.publisherACMes
dc.relation.projectIDinfo:eu-repo/grantAgreement/EC/H2020/730086es
dc.relation.projectIDGobierno de España. TIN2015-65686-C5-1-Res
dc.relation.projectIDComunidad de Madrid. 2016-T2/TIC-1712.es
dc.rights© ACM, 2019es
dc.rights.accessRightsopen accesses
dc.subject.ecienciaInformáticaes
dc.subject.otherReinforcement learningen
dc.subject.otherCase-based reasoningen
dc.subject.otherSoftware agentsen
dc.titleProbabilistic policy reuse for safe reinforcement learningen
dc.typeresearch article*
dc.type.hasVersionAM*
dspace.entity.typePublication
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
probabilistic_ACM_2019_ps.pdf
Size:
693.82 KB
Format:
Adobe Portable Document Format
Description: