Español English Contacte con nosotros http://www.uc3m.es/portal/page/portal/biblioteca
DSpace e-Archivo

Archivo Abierto Institucional de la Universidad Carlos III de Madrid > Investigación > Departamentos > Departamento de Estadística > DES - Working Papers. Statistics and Econometrics. WS >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10016/794

Google™ Scholar. Others By: Niño-Mora, José
Files in This Item:
ws074109.pdf339 kBAdobe PDFformato pdf
Title: Two-stage index computation for bandits with switching penalties I : switching costs
Author(s): Niño-Mora, José
Publisher: Universidad Carlos III de Madrid. Departamento de Estadística
Issued date: May-2007
URI: http://hdl.handle.net/10016/794
Abstract: This paper addresses the multi-armed bandit problem with switching costs. Asawa and Teneketzis (1996) introduced an index that partly characterizes optimal policies, attaching to each bandit state a "continuation index" (its Gittins index) and a "switching index". They proposed to jointly compute both as the Gittins index of a bandit having 2n states — when the original bandit has n states — which results in an eight-fold increase in O(n ) arithmetic operations relative to those to compute the continuation index alone. This paper presents a more efficient, decoupled computation method, which in a first stage computes the continuation index and then, in a second stage, computes the switching index an order of magnitude faster in at most n +O(n) arithmetic operations. The paper exploits the fact that the Asawa and Teneketzis index is the Whittle, or marginal productivity, index of a classic bandit with switching costs in its restless reformulation, by deploying work-reward analysis and PCL-indexability methods introduced by the author. A computational study demonstrates the dramatic runtime savings achieved by the new algorithm, the near-optimality of the index policy, and its substantial gains against the benchmark Gittins index policy across a wide range of instances.
Serie / Nº.: UC3M Working papers. Statistics and Econometrics
07-09
Keywords: Dynamic programming
Markov
Finite state
Bandits
Switching costs
Index policy
Whittle index
Hysteresis
Work-reward analysis
PCL-indexability
Analysis of algorithms
Appears in Collections:DES - Working Papers. Statistics and Econometrics. WS

Refworks Export

SFX Query

This item is licensed under a Creative Commons License
Creative Commons

Items in E-Archivo are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! © Universidad Carlos III de Madrid - Software DSpace - Terms of use - Feedback