☆ 4.4 Article

The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

MATHEMATICS OF OPERATIONS RESEARCH (2011)

期刊

MATHEMATICS OF OPERATIONS RESEARCH

卷 36, 期 4, 页码 593-603

出版社

INFORMS

DOI: 10.1287/moor.1110.0516

关键词

simplex method; policy-iteration method; Markov decision problem; linear programming; dynamic programming; strongly polynomial time

类别

Operations Research & Management Science Mathematics, Applied

资金

NSF [GOALI 0800151]
AFOSR [FA9550-09-1-0306]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

We prove that the classic policy-iteration method [Howard, R. A. 1960. Dynamic Programming and Markov Processes. MIT, Cambridge] and the original simplex method with the most-negative-reduced-cost pivoting rule of Dantzig are strongly polynomial-time algorithms for solving the Markov decision problem (MDP) with a fixed discount rate. Furthermore, the computational complexity of the policy-iteration and simplex methods is superior to that of the only known strongly polynomial-time interior-point algorithm [Ye, Y. 2005. A new complexity result on solving the Markov decision problem. Math. Oper. Res. 30(3) 733-749] for solving this problem. The result is surprising because the simplex method with the same pivoting rule was shown to be exponential for solving a general linear programming problem [Klee, V., G. J. Minty. 1972. How good is the simplex method? Technical report. O. Shisha, ed. Inequalities III. Academic Press, New York], the simplex method with the smallest index pivoting rule was shown to be exponential for solving an MDP regardless of discount rates [Melekopoglou, M., A. Condon. 1994. On the complexity of the policy improvement algorithm for Markov decision processes. INFORMS J. Comput. 6(2) 188-192], and the policy-iteration method was recently shown to be exponential for solving undiscounted MDPs under the average cost criterion. We also extend the result to solving MDPs with transient substochastic transition matrices whose spectral radii are uniformly below one.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

主要评分

4.4

评分不足

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

Solving the joint military medical evacuation problem via a random forest approximate dynamic programming approach

Channel A. Rodriguez, Phillip R. Jenkins, Matthew J. Robbins

Summary: This paper focuses on the MEDEVAC dispatching problem in combat operations, considering triage classification errors and the possibility of having blood transfusion kits on board select MEDEVAC units. A Markov decision process model is formulated and approximate dynamic programming techniques are used to develop high-quality policies. Results show that applying this technique can improve life-saving performance by up to 29%. This research is important for the military medical community and can guide future military MEDEVAC operations.

EXPERT SYSTEMS WITH APPLICATIONS (2023)