
For FullText PDF, please login, if you are a member of IEICE,
or go to Pay Per View on menu list, if you are a nonmember of IEICE.

KLUCBBased Policy for Budgeted MultiArmed Bandits with Stochastic Action Costs
Ryo WATANABE Junpei KOMIYAMA Atsuyoshi NAKAMURA Mineichi KUDO
Publication
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences
Vol.E100A
No.11
pp.24702486 Publication Date: 2017/11/01
Online ISSN: 17451337
DOI: 10.1587/transfun.E100.A.2470
Type of Manuscript: PAPER Category: Mathematical Systems Science Keyword: budgeted multiarmed bandits, asymptotically optimal policy, regret analysis,
Full Text: PDF(1.6MB) >>Buy this Article
Summary:
We study the budgeted multiarmed bandit problem with stochastic action costs. In this problem, a player not only receives a reward but also pays a cost for an action of his/her choice. The goal of the player is to maximize the cumulative reward he/she receives before the total cost exceeds the budget. In the classical multiarmed bandit problem, a policy called KLUCB is known to perform well. We propose KLUCBSC, an extension of this policy for the budgeted bandit problem. We prove that KLUCBSC is asymptotically optimal for the case of Bernoulli costs and rewards. To the best of our knowledge, this is the first result that shows asymptotic optimality in the study of the budgeted bandit problem. In fact, our regret upper bound is at least four times better than that of BTS, the best known upper bound for the budgeted bandit problem. Moreover, an empirical simulation we conducted shows that the performance of a tuned variant of KLUCBSC is comparable to that of stateoftheart policies such as PDBwK and BTS.

