Publication:
Two-stage index computation for bandits with switching penalties I : switching costs

dc.affiliation.dptoUC3M. Departamento de Estadísticaes
dc.contributor.authorNiño Mora, José
dc.contributor.editorUniversidad Carlos III de Madrid. Departamento de Estadísticaen
dc.date.accessioned2007-05-14T07:59:23Z
dc.date.available2007-05-14T07:59:23Z
dc.date.issued2007-05
dc.description.abstractThis paper addresses the multi-armed bandit problem with switching costs. Asawa and Teneketzis (1996) introduced an index that partly characterizes optimal policies, attaching to each bandit state a "continuation index" (its Gittins index) and a "switching index". They proposed to jointly compute both as the Gittins index of a bandit having 2n states — when the original bandit has n states — which results in an eight-fold increase in O(n^3) arithmetic operations relative to those to compute the continuation index alone. This paper presents a more efficient, decoupled computation method, which in a first stage computes the continuation index and then, in a second stage, computes the switching index an order of magnitude faster in at most n^2+O(n) arithmetic operations. The paper exploits the fact that the Asawa and Teneketzis index is the Whittle, or marginal productivity, index of a classic bandit with switching costs in its restless reformulation, by deploying work-reward analysis and PCL-indexability methods introduced by the author. A computational study demonstrates the dramatic runtime savings achieved by the new algorithm, the near-optimality of the index policy, and its substantial gains against the benchmark Gittins index policy across a wide range of instances.en
dc.format.extent347136 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.repecws074109
dc.identifier.urihttps://hdl.handle.net/10016/794
dc.language.isoengen
dc.relation.ispartofseriesUC3M Working papers. Statistics and Econometricsen
dc.relation.ispartofseries07-09en
dc.rightsAtribución-NoComercial-SinDerivadas 3.0 España
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subject.ecienciaEstadística
dc.subject.otherDynamic programmingen
dc.subject.otherMarkoven
dc.subject.otherFinite stateen
dc.subject.otherBanditsen
dc.subject.otherSwitching costsen
dc.subject.otherIndex policyen
dc.subject.otherWhittle indexen
dc.subject.otherHysteresisen
dc.subject.otherWork-reward analysisen
dc.subject.otherPCL-indexabilityen
dc.subject.otherAnalysis of algorithmsen
dc.titleTwo-stage index computation for bandits with switching penalties I : switching costsen
dc.typeworking paper*
dspace.entity.typePublication
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ws074109.pdf
Size:
339 KB
Format:
Adobe Portable Document Format