☆ 4.7 Article

On State Aggregation to Approximate Complex Value Functions in Large-Scale Markov Decision Processes

IEEE TRANSACTIONS ON AUTOMATIC CONTROL (2011)

期刊

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

卷 56, 期 2, 页码 333-344

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TAC.2010.2052697

关键词

Descriptive complexity; discrete event dynamic systems; Markov decision processes (MDPs); ordinal optimization; state aggregation

类别

Automation & Control Systems Engineering, Electrical & Electronic

资金

National Natural Science Foundation of China [60704008, 60736027, 90924001]
Specialized Research Fund for the Doctoral Program of Higher Education [20070003110]
Programme of Introducing Talents of Discipline to Universities [B06002]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Many small electronic devices such as cell phones and wireless sensors have restrictive memory space, computing power, and battery. The pervasive applications of these devices in industry, military, and our daily lives require simple policies that are easy to implement, and can be executed in a reasonably short time. The Markov decision process (MDP) provides a general framework for these policy optimization problems with complexity constraint. In many cases, we can use a powerful computer to find the optimal (or a good) policy and the value function first, and then approximate by a simple one. The approximation usually depends on heuristics or experiences because the relationship between the complexity of a function and the approximation error is not clear in general. In this paper we assume the optimal value function is known (or a reasonably good estimate is available) and consider how to approximate a complex value function. Due to the broad application of state aggregation in the large-scale MDP, we focus on piecewise constant approximate value functions and use the number of aggregated states to measure the complexity of a value function. We quantify how the complexity of a value function affects the approximation error. When the optimal value function is known for sure we develop an algorithm that finds the best simple state aggregation within polynomial time. When we have estimates of the optimal value function, we apply ordinal optimization to find good simple state aggregations with high probability. The algorithms are demonstrated on a node activation policy optimization problem in wireless sensor network. We hope this work can shed some insight on how to find simple policies with good performances.

On State Aggregation to Approximate Complex Value Functions in Large-Scale Markov Decision Processes

期刊

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

On State Aggregation to Approximate Complex Value Functions in Large-Scale Markov Decision Processes

期刊

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文