4.7 Article

Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems

期刊

IEEE TRANSACTIONS ON AUTOMATIC CONTROL
卷 54, 期 6, 页码 1243-1253

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TAC.2009.2019797

关键词

Adaptive control; sequential decision procedures; stochastic approximation

向作者/读者索取更多资源

We consider a class of multi-armed bandit problems where the set of available actions can be mapped to a convex, compact region of R-d, sometimes denoted as the continuum-armed bandit problem. The paper establishes bounds on the efficiency of any arm-selection procedure under certain conditions on the class of possible underlying reward functions. Both finite-time lower bounds on the growth rate of the regret, as well as asymptotic upper bounds on the rates of convergence of the selected control values to the optimum are derived. We explicitly characterize the dependence of these convergence rates (in the minimal rate of variation of the mean reward function in a neighborhood of the optimal control. The bounds can be used to demonstrate the asymptotic optimality of the Kiefer-Wolfowitz method of stochastic approximation with regard to a large class of possible mean reward functions.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据