Probabilistic policy reuse for safe reinforcement learning

García Polo, Francisco Javier; Fernández Rebollo, Fernando

Publication:
Probabilistic policy reuse for safe reinforcement learning

dc.affiliation.dpto	UC3M. Departamento de Informática	es
dc.affiliation.grupoinv	UC3M. Grupo de Investigación: Planificación y Aprendizaje	es
dc.contributor.author	García Polo, Francisco Javier
dc.contributor.author	Fernández Rebollo, Fernando
dc.contributor.funder	Ministerio de Economía y Competitividad (España)	es
dc.contributor.funder	Comunidad de Madrid	es
dc.date.accessioned	2019-12-13T14:57:10Z
dc.date.available	2019-12-13T14:57:10Z
dc.date.issued	2019-03-28
dc.description.abstract	This work introducesPolicy Reuse for Safe Reinforcement Learning, an algorithm that combines ProbabilisticPolicy Reuse and teacher advice for safe exploration in dangerous and continuous state and action reinforce-ment learning problems in which the dynamic behavior is reasonably smooth and the space is Euclidean. Thealgorithm uses a continuously increasing monotonic risk function that allows for the identification of theprobability to end up in failure from a given state. Such a risk function is defined in terms of how far such astate is from the state space known by the learning agent. Probabilistic Policy Reuse is used to safely balancethe exploitation of actual learned knowledge, the exploration of new actions, and the request of teacher advicein parts of the state space considered dangerous. Specifically, thepi-reuse exploration strategy is used. Usingexperiments in the helicopter hover task and a business management problem, we show that thepi-reuseexploration strategy can be used to completely avoid the visit to undesirable situations while maintainingthe performance (in terms of the classical long-term accumulated reward) of the final policy achieved.	es
dc.description.sponsorship	This paper has been partially supported by the Spanish Ministerio de Economía y Competitividad TIN2015-65686-C5-1-R and the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement No. 730086 (ERGO). Javier García is partially supported by the Comunidad de Madrid (Spain) funds under the project 2016-T2/TIC-1712.	es
dc.identifier.bibliographicCitation	García, J. and Fernández, F. Probabilistic Policy Reuse for Safe Reinforcement Learning. ACM Transactions on Autonomous and Adaptive Systems, 13(3), (2019)	es
dc.identifier.doi	https://doi.org/10.1145/3310090
dc.identifier.issn	1556-4665
dc.identifier.publicationissue	3
dc.identifier.publicationtitle	ACM Transactions on Autonomous and Adaptive Systems	en
dc.identifier.publicationvolume	13
dc.identifier.uri	https://hdl.handle.net/10016/29354
dc.identifier.uxxi	AR/0000024064
dc.language.iso	eng	es
dc.publisher	ACM	es
dc.relation.projectID	info:eu-repo/grantAgreement/EC/H2020/730086	es
dc.relation.projectID	Gobierno de España. TIN2015-65686-C5-1-R	es
dc.relation.projectID	Comunidad de Madrid. 2016-T2/TIC-1712.	es
dc.rights	© ACM, 2019	es
dc.rights.accessRights	open access	es
dc.subject.eciencia	Informática	es
dc.subject.other	Reinforcement learning	en
dc.subject.other	Case-based reasoning	en
dc.subject.other	Software agents	en
dc.title	Probabilistic policy reuse for safe reinforcement learning	en
dc.type	research article	*
dc.type.hasVersion	AM	*
dspace.entity.type	Publication

Files

Original bundle

Now showing 1 - 1 of 1

Name:: probabilistic_ACM_2019_ps.pdf
Size:: 693.82 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

DI - PLG - Artículos de Revistas
OpenAIRE: Open Access Infrastructure for Research in Europe

Publication: Probabilistic policy reuse for safe reinforcement learning

Files

Original bundle

Collections

Publication:
Probabilistic policy reuse for safe reinforcement learning