Two-stage index computation for bandits with switching penalties I : switching costs

Niño Mora, José

Publication:
Two-stage index computation for bandits with switching penalties I : switching costs

Identifiers

URI: https://hdl.handle.net/10016/794

Files

ws074109.pdf (339 KB)

Publication date

2007-05

Authors

Niño Mora, José

Impact

Export

Abstract

This paper addresses the multi-armed bandit problem with switching costs. Asawa and Teneketzis (1996) introduced an index that partly characterizes optimal policies, attaching to each bandit state a "continuation index" (its Gittins index) and a "switching index". They proposed to jointly compute both as the Gittins index of a bandit having 2n states — when the original bandit has n states — which results in an eight-fold increase in O(n^3) arithmetic operations relative to those to compute the continuation index alone. This paper presents a more efficient, decoupled computation method, which in a first stage computes the continuation index and then, in a second stage, computes the switching index an order of magnitude faster in at most n^2+O(n) arithmetic operations. The paper exploits the fact that the Asawa and Teneketzis index is the Whittle, or marginal productivity, index of a classic bandit with switching costs in its restless reformulation, by deploying work-reward analysis and PCL-indexability methods introduced by the author. A computational study demonstrates the dramatic runtime savings achieved by the new algorithm, the near-optimality of the index policy, and its substantial gains against the benchmark Gittins index policy across a wide range of instances.

Keywords

Dynamic programming, Markov, Finite state, Bandits, Switching costs, Index policy, Whittle index, Hysteresis, Work-reward analysis, PCL-indexability, Analysis of algorithms

Collections

DES - Working Papers. Statistics and Econometrics. WS

Full item page

Publication:
Two-stage index computation for bandits with switching penalties I : switching costs

Identifiers

Files

Publication date

Defense date

Authors

Advisors

Tutors

Journal Title

Journal ISSN

Volume Title

Publisher

Impact

Export

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Bibliographic citation

Collections

Publication: Two-stage index computation for bandits with switching penalties I : switching costs

Identifiers

Files

Publication date

Defense date

Authors

Advisors

Tutors

Journal Title

Journal ISSN

Volume Title

Publisher

Impact

Export

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Bibliographic citation

Collections

Publication:
Two-stage index computation for bandits with switching penalties I : switching costs