DI - PLG - Artículos de Revistas

Permanent URI for this collection


Recent Submissions

Now showing 1 - 20 of 54
  • Publication
    Using automated planning for traffic signals control
    (University of Zagreb, 2016-08-30) Gulic, Matija; Olivares, Ricardo; Borrajo Millán, Daniel; Ministerio de Ciencia e Innovación (España)
    Solving traffic congestions represents a high priority issue in many big cities. Traditional traffic control systems are mainly based on pre-programmed, reactive and local techniques. This paper presents an autonomic system that uses automated planning techniques instead. These techniques are easily configurable and modified, and can reason about the future implications of actions that change the default traffic lights behaviour. The proposed implemented system includes some autonomic properties, since it monitors the current traffic state, detects if the system is degrading its performance, sets up new sets of goals to be achieved by the planner, triggers the planner that generates plans with control actions, and executes the selected courses of actions. The obtained results in several artificial and real world data-based simulation scenarios show that the proposed system can efficiently solve traffic congestion.
  • Publication
    Using Pre-Computed Knowledge for Goal Allocation in Multi-Agent Planning
    (Springer, 2019-05-15) Luis Mingueza, Nerea; Pereira, Tiago; Fernández Arregui, Susana; Moreira, Antonio; Borrajo Millán, Daniel; Veloso, Manuela; Ministerio de Economía y Competitividad (España); Agencia Estatal de Investigación (España)
    Many real-world robotic scenarios require performing task planning to decide courses of actions to be executed by (possibly heterogeneous) robots. A classical centralized planning approach has to find a solution inside a search space that contains every possible combination of robots and goals. This leads to inefficient solutions that do not scale well. Multi-Agent Planning (MAP) provides a new way to solve this kind of tasks efficiently. Previous works on MAP have proposed to factorize the problem to decrease the planning effort i.e. dividing the goals among the agents (robots). However, these techniques do not scale when the number of agents and goals grow. Also, in most real world scenarios with big maps, goals might not be reached by every robot so it has a computational cost associated. In this paper we propose a combination of robotics and planning techniques to alleviate and boost the computation of the goal assignment process. We use Actuation Maps (AMs). Given a map, AMs can determine the regions each agent can actuate on. Thus, specific information can be extracted to know which goals can be tackled by each agent, as well as cheaply estimating the cost of using each agent to achieve every goal. Experiments show that when information extracted from AMs is provided to a multi-agent planning algorithm, the goal assignment is significantly faster, speeding-up the planning process considerably. Experiments also show that this approach greatly outperforms classical centralized planning.
  • Publication
    Plan merging by reuse for multi-agent planning
    (Springer, 2019-01-24) Luis Mingueza, Nerea; Fernández Arregui, Susana; Borrajo Millán, Daniel; Ministerio de Economía y Competitividad (España); Agencia Estatal de Investigación (España)
    Multi-Agent Planning deals with the task of generating a plan for/by a set of agents that jointly solve a planning problem. One of the biggest challenges is how to handle interactions arising from agents' actions. The first contribution of the paper is Plan Merging by Reuse, pmr, an algorithm that automatically adjusts its behaviour to the level of interaction. Given a multi-agent planning task, pmr assigns goals to specific agents. The chosen agents solve their individual planning tasks and the resulting plans are merged. Since merged plans are not always valid, pmr performs planning by reuse to generate a valid plan. The second contribution of the paper is rrpt-plan, a stochastic plan-reuse planner that combines plan reuse, standard search and sampling. We have performed extensive sets of experiments in order to analyze the performance of pmr in relation to state of the art multi-agent planning techniques.
  • Publication
    A taxonomy for similarity metrics between Markov decision processes
    (Springer, 2022-11) García Polo, Francisco Javier; Visús, Álvaro; Fernández Rebollo, Fernando; Comunidad de Madrid; Universidad Carlos III de Madrid
    Although the notion of task similarity is potentially interesting in a wide range of areas such as curriculum learning or automated planning, it has mostly been tied to transfer learning. Transfer is based on the idea of reusing the knowledge acquired in the learning of a set of source tasks to a new learning process in a target task, assuming that the target and source tasks are close enough. In recent years, transfer learning has succeeded in making reinforcement learning (RL) algorithms more efficient (e.g., by reducing the number of samples needed to achieve (near-)optimal performance). Transfer in RL is based on the core concept of similarity: whenever the tasks are similar, the transferred knowledge can be reused to solve the target task and significantly improve the learning performance. Therefore, the selection of good metrics to measure these similarities is a critical aspect when building transfer RL algorithms, especially when this knowledge is transferred from simulation to the real world. In the literature, there are many metrics to measure the similarity between MDPs, hence, many definitions of similarity or its complement distance have been considered. In this paper, we propose a categorization of these metrics and analyze the definitions of similarity proposed so far, taking into account such categorization. We also follow this taxonomy to survey the existing literature, as well as suggesting future directions for the construction of new metrics.
  • Publication
    Efficient approaches for multi-agent planning
    (Springer, 2019-02-01) Borrajo Millán, Daniel; Fernández Arregui, Susana
    Multi-agent planning (MAP) deals with planning systems that reason on long-term goals by multiple collaborative agents which want to maintain privacy on their knowledge. Recently, new MAP techniques have been devised to provide efficient solutions. Most approaches expand distributed searches using modified planners, where agents exchange public information. They present two drawbacks: they are planner-dependent; and incur a high communication cost. Instead, we present two algorithms whose search processes are monolithic (no communication while individual planning) and MAP tasks are compiled such that they are planner-independent (no programming effort needed when replacing the base planner). Our two approaches first assign each public goal to a subset of agents. In the first distributed approach, agents iteratively solve problems by receiving plans, goals and states from previous agents. After generating new plans by reusing previous agents' plans, they share the new plans and some obfuscated private information with the following agents. In the second centralized approach, agents generate an obfuscated version of their problems to protect privacy and then submit it to an agent that performs centralized planning. The resulting approaches are efficient, outperforming other state-of-the-art approaches.
  • Publication
    Selecting goals in oversubscription planning using relaxed plans
    (Elsevier, 2021-02-01) García Olaya, Ángel; De la Rosa Turbides, Tomás Eduardo; Borrajo Millán, Daniel; European Commission; Agencia Estatal de Investigación (España)
    Planning deals with the task of finding an ordered set of actions that achieves some goals from an initial state. In many real-world applications it is unfeasible to find a plan achieving all goals due to limitations in the available resources. A common case consists of having a bound on a given cost measure that is less than the optimal cost needed to achieve all goals. Oversubscription planning (OSP) is the field of Automated Planning dealing with such kinds of problems. Usually, OSP generates plans that achieve only a subset of the goals set. In this paper we present a new technique to a priori select goals in no-hard-goals satisficing OSP by searching in the space of subsets of goals. A key property of the proposed approach is that it is planner-independent once the goals have been selected; it creates a new non-OSP problem that can be solved using off-the-shelf planners. Extensive experimental results show that the proposed approach outperforms state-of-the-art OSP techniques in several domains of the International Planning Competition.
  • Publication
    A framework for user adaptation and profiling for social robotics in rehabilitation
    (MDPI, 2020-08-25) Martín, Alejandro; Pulido Pascual, José Carlos; González Dorado, José Carlos; García Olaya, Ángel; Suárez Mejías, Cristina; Agencia Estatal de Investigación (España)
    Physical rehabilitation therapies for children present a challenge, and its success—the improvement of the patient’s condition—depends on many factors, such as the patient’s attitude and motivation, the correct execution of the exercises prescribed by the specialist or his progressive recovery during the therapy. With the aim to increase the benefits of these therapies, social humanoid robots with a friendly aspect represent a promising tool not only to boost the interaction with the pediatric patient, but also to assist physicians in their work. To achieve both goals, it is essential to monitor in detail the patient’s condition, trying to generate user profile models which enhance the feedback with both the system and the specialist. This paper describes how the project NAOTherapist—a robotic architecture for rehabilitation with social robots—has been upgraded in order to include a monitoring system able to generate user profile models through the interaction with the patient, performing user-adapted therapies. Furthermore, the system has been improved by integrating a machine learning algorithm which recognizes the pose adopted by the patient and by adding a clinical reports generation system based on the QUEST metric
  • Publication
    An Automated Planning Model for HRI: Use Cases on Social Assistive Robotics
    (MDPI, 2020-11-02) Fuentetaja Pizán, Raquel; García Olaya, Ángel; García Polo, Francisco Javier; González Dorado, José Carlos; Fernández Rebollo, Fernando; Comunidad de Madrid; European Commission; Ministerio de Ciencia e Innovación (España)
    Using Automated Planning for the high level control of robotic architectures is becoming very popular thanks mainly to its capability to define the tasks to perform in a declarative way. However, classical planning tasks, even in its basic standard Planning Domain Definition Language (PDDL) format, are still very hard to formalize for non expert engineers when the use case to model is complex. Human Robot Interaction (HRI) is one of those complex environments. This manuscript describes the rationale followed to design a planning model able to control social autonomous robots interacting with humans. It is the result of the authors’ experience in modeling use cases for Social Assistive Robotics (SAR) in two areas related to healthcare: Comprehensive Geriatric Assessment (CGA) and non-contact rehabilitation therapies for patients with physical impairments. In this work a general definition of these two use cases in a unique planning domain is proposed, which favors the management and integration with the software robotic architecture, as well as the addition of new use cases. Results show that the model is able to capture all the relevant aspects of the Human-Robot interaction in those scenarios, allowing the robot to autonomously perform the tasks by using a standard planning-execution architecture.
  • Publication
    The IBaCoP planning system: instance-based configured portfolios
    (AI Access Foundation, 2016-08-01) Cenamor Guijarro, Isabel Rosario; Rosa Turbides, Tomás Eduardo de la; Fernández Rebollo, Fernando; Ministerio de Economía y Competitividad (España)
    Sequential planning portfolios are very powerful in exploiting the complementary strength of different automated planners. The main challenge of a portfolio planner is to define which base planners to run, to assign the running time for each planner and to decide in what order they should be carried out to optimize a planning metric. Portfolio configurations are usually derived empirically from training benchmarks and remain fixed for an evaluation phase. In this work, we create a per-instance configurable portfolio, which is able to adapt itself to every planning task. The proposed system pre-selects a group of candidate planners using a Pareto-dominance filtering approach and then it decides which planners to include and the time assigned according to predictive models. These models estimate whether a base planner will be able to solve the given problem and, if so, how long it will take. We define different portfolio strategies to combine the knowledge generated by the models. The experimental evaluation shows that the resulting portfolios provide an improvement when compared with non-informed strategies. One of the proposed portfolios was the winner of the Sequential Satisficing Track of the International Planning Competition held in 2014.
  • Publication
    Modeling, evaluation, and scale on artificial pedestrians: a literature review
    (ACM, 2017-09-01) Martínez Gil, Francisco; Lozano, Miguel; García Fernández, Ignacio; Fernández Rebollo, Fernando; Ministerio de Economía y Competitividad (España)
    Modeling pedestrian dynamics and their implementation in a computer are challenging and important issues in the knowledge areas of transportation and computer simulation. The aim of this article is to provide a bibliographic outlook so that the reader may have quick access to the most relevant works related to this problem. We have used three main axes to organize the article's contents: pedestrian models, validation techniques, and multiscale approaches. The backbone of this work is the classification of existing pedestrian models; we have organized the works in the literature under five categories, according to the techniques used for implementing the operational level in each pedestrian model. Then the main existing validation methods, oriented to evaluate the behavioral quality of the simulation systems, are reviewed. Furthermore, we review the key issues that arise when facing multiscale pedestrian modeling, where we first focus on the behavioral scale (combinations of micro and macro pedestrian models) and second on the scale size (from individuals to crowds). The article begins by introducing the main characteristics of walking dynamics and its analysis tools and concludes with a discussion about the contributions that different knowledge fields can make in the near future to this exciting area.
  • Publication
    Emergent behaviors and scalability for multi-agent reinforcement learning-based pedestrian models
    (Elsevier, 2017-05-01) Martínez Gil, Francisco; Lozano, Miguel; Fernández Rebollo, Fernando; Ministerio de Economía y Competitividad (España)
    This paper analyzes the emergent behaviors of pedestrian groups that learn through the multiagent reinforcement learning model developed in our group. Five scenarios studied in the pedestrian model literature, and with different levels of complexity, were simulated in order to analyze the robustness and the scalability of the model. Firstly, a reduced group of agents must learn by interaction with the environment in each scenario. In this phase, each agent learns its own kinematic controller, that will drive it at a simulation time. Secondly, the number of simulated agents is increased, in each scenario where agents have previously learnt, to test the appearance of emergent macroscopic behaviors without additional learning. This strategy allows us to evaluate the robustness and the consistency and quality of the learned behaviors. For this purpose several tools from pedestrian dynamics, such as fundamental diagrams and density maps, are used. The results reveal that the developed model is capable of simulating human-like micro and macro pedestrian behaviors for the simulation scenarios studied, including those where the number of pedestrians has been scaled by one order of magnitude with respect to the situation learned.
  • Publication
    A three-layer planning architecture for the autonomous control of rehabilitation therapies based on social robots
    (Elsevier, 2017-06-01) González Dorado, José Carlos; Pulido Pascual, José Carlos; Fernández Rebollo, Fernando; Ministerio de Economía y Competitividad (España)
    This manuscript focuses on the description of a novel cognitive architecture called NAOTherapist, which provides a social robot with enough autonomy to carry out a non-contact upper limb rehabilitation therapy for patients with physical impairments, such as cerebral palsy and obstetric brachial plexus palsy. NAOTherapist comprises three levels of Automated Planning. In the high-level planning, the physician establishes the parameters of the therapy such as the scheduling of the sessions, the therapeutic objectives to be achieved and certain constraints based on the medical records of the patient. This information is used to establish a customized therapy plan. The objective of the medium-level planning is to execute and monitor every previous planned session with the humanoid robot. Finally, the low-level planning involves the execution of path-planning actions by the robot to carry out different low-level instructions such as performing poses. The technical evaluation shows an accurate definition and monitoring of the therapies and sessions and a fluent interaction with the robot. This automated process is expected to save time for the professionals while guaranteeing the medical criteria.
  • Publication
    Evaluating the child-robot interaction of the NAOTherapist platform in pediatric rehabilitation
    (Springer, 2017-06-01) Pulido Pascual, José Carlos; González Dorado, José Carlos; Suárez Mejías, Cristina; Bandera, Antonio; Bustos, Pablo; Fernández Rebollo, Fernando
    NAOTherapist is a cognitive robotic architecture whose main goal is to develop non-contact upper-limb rehabilitation sessions autonomously with a social robot for patients with physical impairments. In order to achieve a fluent interaction and an active engagement with the patients, the system should be able to adapt by itself in accordance with the perceived environment. In this paper, we describe the interaction mechanisms that are necessary to supervise and help the patient to carry out the prescribed exercises correctly. We also provide an evaluation focused on the child-robot interaction of the robotic platform with a large number of schoolchildren and the experience of a first contact with three pediatric rehabilitation patients. The results presented are obtained through questionnaires, video analysis and system logs, and have proven to be consistent with the hypotheses proposed in this work.
  • Publication
    On-line case-based policy learning for automated planning in probabilistic environments
    (World Scientific Publishing Company, 2018-05-01) Martínez Muñoz, Moises; García Polo, Francisco Javier; Fernández Rebollo, Fernando; European Commission; Ministerio de Economía y Competitividad (España)
    Many robotic control architectures perform a continuous cycle of sensing, reasoning and acting, where that reasoning can be carried out in a reactive or deliberative form. Reactive methods are fast and provide the robot with high interaction and response capabilities. Deliberative reasoning is particularly suitable in robotic systems because it employs some form of forward projection (reasoning in depth about goals, pre-conditions, resources and timing constraints) and provides the robot reasonable responses in situations unforeseen by the designer. However, this reasoning, typically conducted using Artificial Intelligence techniques like Automated Planning (AP), is not effective for controlling autonomous agents which operate in complex and dynamic environments. Deliberative planning, although feasible in stable situations, takes too long in unexpected or changing situations which require re-planning. Therefore, planning cannot be done on-line in many complex robotic problems, where quick responses are frequently required. In this paper, we propose an alternative approach based on case-based policy learning which integrates deliberative reasoning through AP and reactive response time through reactive planning policies. The method is based on learning planning knowledge from actual experiences to obtain a case-based policy. The contribution of this paper is two fold. First, it is shown that the learned case-based policy produces reasonable and timely responses in complex environments. Second, it is also shown how one case-based policy that solves a particular problem can be reused to solve a similar but more complex problem in a transfer learning scope.
  • Publication
    Developing a robot-guided interactive simon game for physical and cognitive training
    (World Scientific Publishing Company, 2019-02-01) Turp, Misra; González, José Carlos; Pulido Pascual, José Carlos; Fernández Rebollo, Fernando; Ministerio de Economía y Competitividad (España)
    Enveloping cognitive or physical rehabilitation into a game highly increases the patients' commitment with their treatment. Specially with children, keeping them motivated is a very time-consuming work, so therapists are demanding tools to help them with this task. NAOTherapist is a generic robotic architecture that uses Automated Planning techniques to autonomously drive noncontact upper-limb rehabilitation sessions for children with a humanoid NAO robot. Our aim is to develop more robotic games for this platform to enrich its variability and possibilities of interaction. The goal of this work is to present our first attempt to develop a different, more complex game that reuses the previous architecture. We contribute with the design description of a novel robotic Simon game that employs upper-limb poses instead of colors and could qualify as a cognitive and physical training. Statistics of evaluation tests with 14 adults and 56 children are displayed and the outcomes are analyzed in terms of human-robot interaction (HRI) quality. The results demonstrate the application-domain generalization capabilities of the NAOTherapist architecture and give an insight to further analyze the therapeutic benefits of the new developed Simon game.
  • Publication
    Reinforcement learning for pricing strategy optimization in the insurance industry
    (Elsevier, 2019-04-01) Krasheninnikova, Elena; García Polo, Francisco Javier; Losada Maestre, Roberto; Fernández Rebollo, Fernando; Comunidad de Madrid; Ministerio de Economía y Competitividad (España)
    Pricing is a fundamental problem in the banking sector, and is closely related to a number of financial products such as credit scoring or insurance. In the insurance industry an important question arises, namely: how can insurance renewal prices be adjusted? Such an adjustment has two conflicting objectives. On the one hand, insurers are forced to retain existing customers, while on the other hand insurers are also forced to increase revenue. Intuitively, one might assume that revenue increases by offering high renewal prices, however this might also cause many customers to terminate their contracts. Contrarily, low renewal prices help retain most existing customers, but could negatively affect revenue. Therefore, adjusting renewal prices is a non-trivial problem for the insurance industry. In this paper, we propose a novel modelization of the renewal price adjustment problem as a sequential decision problem and, consequently, as a Markov decision process (MDP). In particular, this study analyzes two different strategies to carry out this adjustment. The first is about maximizing revenue analyzing the effect of this maximization on customer retention, while the second is about maximizing revenue subject to the client retention level not falling below a given threshold. The former case is related to MDPs with a single criterion to be optimized. The latter case is related to Constrained MDPs (CMDPs) with two criteria, where the first one is related to optimization, while the second is subject to a constraint. This paper also contributes with the resolution of these models by means of a model-free Reinforcement Learning algorithm. Results have been reported using real data from the insurance division of BBVA, one of the largest Spanish companies in the banking sector.
  • Publication
    Probabilistic policy reuse for safe reinforcement learning
    (ACM, 2019-03-28) García Polo, Francisco Javier; Fernández Rebollo, Fernando; Ministerio de Economía y Competitividad (España); Comunidad de Madrid
    This work introducesPolicy Reuse for Safe Reinforcement Learning, an algorithm that combines ProbabilisticPolicy Reuse and teacher advice for safe exploration in dangerous and continuous state and action reinforce-ment learning problems in which the dynamic behavior is reasonably smooth and the space is Euclidean. Thealgorithm uses a continuously increasing monotonic risk function that allows for the identification of theprobability to end up in failure from a given state. Such a risk function is defined in terms of how far such astate is from the state space known by the learning agent. Probabilistic Policy Reuse is used to safely balancethe exploitation of actual learned knowledge, the exploration of new actions, and the request of teacher advicein parts of the state space considered dangerous. Specifically, thepi-reuse exploration strategy is used. Usingexperiments in the helicopter hover task and a business management problem, we show that thepi-reuseexploration strategy can be used to completely avoid the visit to undesirable situations while maintainingthe performance (in terms of the classical long-term accumulated reward) of the final policy achieved.
  • Publication
    A case-based approach to heuristic planning
    (Springer, 2013-01) Rosa Turbides, Tomás Eduardo de la; García Olaya, Ángel; Borrajo Millán, Daniel
    Most of the great success of heuristic search as an approach to AI Planning is due to the right design of domain-independent heuristics. Although many heuristic planners perform reasonably well, the computational cost of computing the heuristic function in every search node is very high, causing the planner to scale poorly when increasing the size of the planning tasks. For tackling this problem, planners can incorporate additional domain-dependent heuristics in order to improve their performance. Learning-based planners try to automatically acquire these domain-dependent heuristics using previous solved problems. In this work, we present a case-based reasoning approach that learns abstracted state transitions that serve as domain control knowledge for improving the planning process. The recommendations from the retrieved cases are used as guidance for pruning or ordering nodes in different heuristic search algorithms applied to planning tasks. We show that the CBR guidance is appropriate for a considerable number of planning benchmarks.
  • Publication
    Integrating Planning, Execution, and Learning to Improve Plan Execution
    (Wiley, 2013-02) Jiménez Celorrio, Sergio; Fernández Rebollo, Fernando; Borrajo Millán, Daniel
    Algorithms for planning under uncertainty require accurate action models that explicitly capture the uncertainty of the environment. Unfortunately, obtaining these models is usually complex. In environments with uncertainty, actions may produce countless outcomes and hence, specifying them and their probability is a hard task. As a consequence, when implementing agents with planning capabilities, practitioners frequently opt for architectures that interleave classical planning and execution monitoring following a replanning when failure paradigm. Though this approach is more practical, it may produce fragile plans that need continuous replanning episodes or even worse, that result in execution dead-ends. In this paper, we propose a new architecture to relieve these shortcomings. The architecture is based on the integration of a relational learning component and the traditional planning and execution monitoring components. The new component allows the architecture to learn probabilistic rules of the success of actions from the execution of plans and to automatically upgrade the planning model with these rules. The upgraded models can be used by any classical planner that handles metric functions or, alternatively, by any probabilistic planner. This architecture proposal is designed to integrate off-the-shelf interchangeable planning and learning components so it can profit from the last advances in both fields without modifying the architecture.
  • Publication
    Asistente Robótico Social Interactivo para Terapias de Rehabilitación Motriz con Pacientes de Pediatría
    (Comité Español de Automática, 2015-03) Calderita, Luis Vicente; Bustos, Pablo; Fernández, Fernando; Viciana, R.; Bandera, Antonio; Suárez Mejías, Cristina
    El objetivo de las terapias de rehabilitación motriz es la recuperación de zonas dañadas mediante la repetición de ciertas actividades motrices. En este esquema, la recuperación del paciente depende directamente de su adherencia al tratamiento, por lo que las terapias convencionales, con sus intensivas sesiones de rehabilitación que se prolongan en el tiempo, provocan en numerosas ocasiones su desmotivación, haciendo que no se consiga siempre que éste cumpla con los objetivos fijados. Por otra parte, la correcta ejecución de estas terapias en hospitales y otros centros médicos requieren una dedicación y esfuerzo importante y continuado por parte de los profesionales médicos, lo que supone a su vez un coste importante para las instituciones sanitarias. En este ámbito de aplicación, este artículo describe el desarrollo de una terapia de rehabilitación motriz novedosa, centrada en un robot socialmente interactivo, que se convierte en fuente de motivación pero también en un asistente para llevar a cabo terapias rehabilitadoras personalizadas. La experiencia ha sido también el germen del diseño e implementación de una arquitectura de control novedosa, RoboCog, que ha dotado al robot de las capacidades perceptivas y cognitivas que le permiten exhibir un comportamiento socialmente desarrollado y pro-activo. Las pruebas de verificación llevadas a cabo sobre los distintos elementos de la arquitectura muestran el correcto funcionamiento de éstos y de su integración con el resto de la arquitectura. Además, dicha terapia ha sido evaluada satisfactoriamente en sesiones individuales con pacientes de pediatría con parálisis braquial obstétrica (PBO), una patología producida por un daño adquirido en el momento del nacimiento y que afecta a la movilidad motriz de las extremidades superiores, pero no a sus capacidades intelectuales y comunicativas.