Two-stage index computation for bandits with switching penalties I : switching costs

Niño Mora, José

Publication:
Two-stage index computation for bandits with switching penalties I : switching costs

dc.affiliation.dpto	UC3M. Departamento de Estadística	es
dc.contributor.author	Niño Mora, José
dc.contributor.editor	Universidad Carlos III de Madrid. Departamento de Estadística	en
dc.date.accessioned	2007-05-14T07:59:23Z
dc.date.available	2007-05-14T07:59:23Z
dc.date.issued	2007-05
dc.description.abstract	This paper addresses the multi-armed bandit problem with switching costs. Asawa and Teneketzis (1996) introduced an index that partly characterizes optimal policies, attaching to each bandit state a "continuation index" (its Gittins index) and a "switching index". They proposed to jointly compute both as the Gittins index of a bandit having 2n states — when the original bandit has n states — which results in an eight-fold increase in O(n^3) arithmetic operations relative to those to compute the continuation index alone. This paper presents a more efficient, decoupled computation method, which in a first stage computes the continuation index and then, in a second stage, computes the switching index an order of magnitude faster in at most n^2+O(n) arithmetic operations. The paper exploits the fact that the Asawa and Teneketzis index is the Whittle, or marginal productivity, index of a classic bandit with switching costs in its restless reformulation, by deploying work-reward analysis and PCL-indexability methods introduced by the author. A computational study demonstrates the dramatic runtime savings achieved by the new algorithm, the near-optimality of the index policy, and its substantial gains against the benchmark Gittins index policy across a wide range of instances.	en
dc.format.extent	347136 bytes
dc.format.mimetype	application/pdf
dc.identifier.repec	ws074109
dc.identifier.uri	https://hdl.handle.net/10016/794
dc.language.iso	eng	en
dc.relation.ispartofseries	UC3M Working papers. Statistics and Econometrics	en
dc.relation.ispartofseries	07-09	en
dc.rights	Atribución-NoComercial-SinDerivadas 3.0 España
dc.rights.accessRights	open access
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subject.eciencia	Estadística
dc.subject.other	Dynamic programming	en
dc.subject.other	Markov	en
dc.subject.other	Finite state	en
dc.subject.other	Bandits	en
dc.subject.other	Switching costs	en
dc.subject.other	Index policy	en
dc.subject.other	Whittle index	en
dc.subject.other	Hysteresis	en
dc.subject.other	Work-reward analysis	en
dc.subject.other	PCL-indexability	en
dc.subject.other	Analysis of algorithms	en
dc.title	Two-stage index computation for bandits with switching penalties I : switching costs	en
dc.type	working paper	*
dspace.entity.type	Publication