Español English Contacte con nosotros http://www.uc3m.es/portal/page/portal/biblioteca
DSpace e-Archivo

Archivo Abierto Institucional de la Universidad Carlos III de Madrid > Investigación > Departamentos > Departamento de Informática > Grupo de Investigación en Planificación y Aprendizaje Automático (PLG) > DI - PLG - Artículos de Revistas >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10016/6822

Google™ Scholar. Others By: Fernández, Fernando - Borrajo, Daniel
Files in This Item:
two_fernandez_IJIS_2008_ps.pdf1,8 MBAdobe PDFformato pdf
Title: Two steps reinforcement learning
Author(s): Fernández, Fernando
Borrajo, Daniel
Publisher: Wiley Periodicals
Issued date: Jan-2008
Citation: International Journal of Intelligent Systems, February 2008, vol. 23, n. 2, p. 213-245
URI: http://hdl.handle.net/10016/6822
ISSN: 0884-8173 (Print)
1098-111X (Online)
DOI: http://dx.doi.org/10.1002/int.20255
Abstract: When applying reinforcement learning in domains with very large or continuous state spaces, the experience obtained by the learning agent in the interaction with the environment must be generalized. The generalization methods are usually based on the approximation of the value functions used to compute the action policy and tackled in two different ways. On the one hand by using an approximation of the value functions based on a supervized learning method. On the other hand, by discretizing the environment to use a tabular representation of the value functions. In this work, we propose an algorithm that uses both approaches to use the benefits of both mechanisms, allowing a higher performance. The approach is based on two learning phases. In the first one, a learner is used as a supervized function approximator, but using a machine learning technique which also outputs a state space discretization of the environment, such as nearest prototype classifiers or decision trees do. In the second learning phase, the space discretization computed in the first phase is used to obtain a tabular representation of the value function computed in the previous phase, allowing a tuning of such value function approximation. Experiments in different domains show that executing both learning phases improves the results obtained executing only the first one. The results take into account the resources used and the performance of the learned behavior.
Sponsor: This research was partially conducted while the firs author was visiting Carnegie Mellon University from the Universidad Carlos III de Madrid, supported by a generous grant from the Spanish Ministry of Education and Fulbright. Both authors were partially sponsored by the Spanish MEC project TIN2005-08945-C06-05 and regional CAM-UC3M project number CCG06-UC3M/TIC-0831.
Review: PeerReviewed
Publisher version: http://dx.doi.org/10.1002/int.20255
Rights: © Wiley Periodicals
Appears in Collections:DI - PLG - Artículos de Revistas

Refworks Export

SFX Query

Items in E-Archivo are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! © Universidad Carlos III de Madrid - Software DSpace - Terms of use - Feedback