Two-stage index computation for bandits with switching penalties I : switching costs

e-Archivo Repository

Show simple item record Niño Mora, José
dc.contributor.editor Universidad Carlos III de Madrid. Departamento de Estadística 2007-05-14T07:59:23Z 2007-05-14T07:59:23Z 2007-05
dc.description.abstract This paper addresses the multi-armed bandit problem with switching costs. Asawa and Teneketzis (1996) introduced an index that partly characterizes optimal policies, attaching to each bandit state a "continuation index" (its Gittins index) and a "switching index". They proposed to jointly compute both as the Gittins index of a bandit having 2n states — when the original bandit has n states — which results in an eight-fold increase in O(n^3) arithmetic operations relative to those to compute the continuation index alone. This paper presents a more efficient, decoupled computation method, which in a first stage computes the continuation index and then, in a second stage, computes the switching index an order of magnitude faster in at most n^2+O(n) arithmetic operations. The paper exploits the fact that the Asawa and Teneketzis index is the Whittle, or marginal productivity, index of a classic bandit with switching costs in its restless reformulation, by deploying work-reward analysis and PCL-indexability methods introduced by the author. A computational study demonstrates the dramatic runtime savings achieved by the new algorithm, the near-optimality of the index policy, and its substantial gains against the benchmark Gittins index policy across a wide range of instances.
dc.format.extent 347136 bytes
dc.format.mimetype application/pdf
dc.language.iso eng
dc.relation.ispartofseries UC3M Working papers. Statistics and Econometrics
dc.relation.ispartofseries 07-09
dc.rights Atribución-NoComercial-SinDerivadas 3.0 España
dc.subject.other Dynamic programming
dc.subject.other Markov
dc.subject.other Finite state
dc.subject.other Bandits
dc.subject.other Switching costs
dc.subject.other Index policy
dc.subject.other Whittle index
dc.subject.other Hysteresis
dc.subject.other Work-reward analysis
dc.subject.other PCL-indexability
dc.subject.other Analysis of algorithms
dc.title Two-stage index computation for bandits with switching penalties I : switching costs
dc.type workingPaper
dc.subject.eciencia Estadística
dc.rights.accessRights openAccess
dc.identifier.repec ws074109
dc.affiliation.dpto UC3M. Departamento de Estadística
 Find Full text

Files in this item

*Click on file's image for preview. (Embargoed files's preview is not supported)

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record