☆ 4.5 Article

Point-Based Value Iteration for Finite-Horizon POMDPs

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH (2019)

Journal

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH

Volume 65, Issue -, Pages 307-341

Publisher

AI ACCESS FOUNDATION

DOI: 10.1613/jair.1.11324

Keywords

-

Categories

Computer Science, Artificial Intelligence

Funding

Netherlands Organisation for Scientific Research (NWO), as part of the Uncertainty Reduction in Smart Energy Systems (URSES) program

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Partially Observable Markov Decision Processes (POMDPs) are a popular formalism for sequential decision making in partially observable environments. Since solving POMDPs to optimality is a difficult task, point-based value iteration methods are widely used. These methods compute an approximate POMDP solution, and in some cases they even provide guarantees on the solution quality, but these algorithms have been designed for problems with an infinite planning horizon. In this paper we discuss why state-of-the-art point-based algorithms cannot be easily applied to finite-horizon problems that do not include discounting. Subsequently, we present a general point-based value iteration algorithm for finite-horizon problems which provides solutions with guarantees on solution quality. Furthermore, we introduce two heuristics to reduce the number of belief points considered during execution, which lowers the computational requirements. In experiments we demonstrate that the algorithm is an effective method for solving finite-horizon POMDPs.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5

Not enough ratings

Secondary Ratings

Novelty

-

Significance

-

Scientific rigor

-

Rate this paper

Recommended

Proceedings Paper Automation & Control Systems

Point-based Value Iteration for VAR-POMDPs

Wei Zheng, Hai Lin

Summary: The paper extends the traditional POMDP to VAR-POMDP, discusses solving the issue of temporal correlation among continuous observations, and proposes a double point-based value iteration algorithm to address this problem.

2021 AMERICAN CONTROL CONFERENCE (ACC) (2021)

Add to Collection

Article Engineering, Mechanical

A novel Z-function-based completely model-free reinforcement learning method to finite-horizon zero-sum game of nonlinear system

Zhe Chen, Wenqian Xue, Ning Li, Bosen Lian, Frank L. Lewis

Summary: This paper proposes a completely model-free reinforcement learning (RL) method for solving the finite-horizon two-player zero-sum game problem of continuous-time nonlinear systems. By defining a novel Z-function, introducing a model-based RL policy iteration framework, and applying integral RL and iterative learning control techniques, the solution seeking and system dynamics requirement are further simplified, leading to improved algorithm efficiency and reduced model dependency.

NONLINEAR DYNAMICS (2022)

Add to Collection

Article Operations Research & Management Science

A modified fixed point iteration method for solving the system of absolute value equations

Dongmei Yu, Cairong Chen, Deren Han

Summary: This article presents a modified fixed point iteration method for solving absolute value equations, which improves efficiency and shows linear convergence under certain conditions. Numerical results demonstrate its superiority compared to the original method in specific cases.

OPTIMIZATION (2022)

Add to Collection

Article Automation & Control Systems

Point-Based Value Iteration for VAR-POMDPs

Wei Zheng, Hai Lin

Summary: This paper introduces a VAR-POMDP model which extends the traditional POMDP model and proposes a feasible planning algorithm. The VAR-POMDP model can be solved by approximating the exact value function using a class of piece-wise linear functions within the dynamic programming framework.

IEEE CONTROL SYSTEMS LETTERS (2022)

Add to Collection

Article Automation & Control Systems

On the global convergence of relative value iteration for infinite-horizon risk-sensitive control of diffusions

Hassan Hmedi, Ari Arapostathis, Guodong Pang

Summary: In this paper, a multiplicative relative value iteration algorithm (RVI) for infinite-horizon risk-sensitive control of diffusions in Rd is studied. The author proves that the RVI algorithm converges to the solution of the multiplicative HJB equation within a neighborhood of the solution (local convergence) when the diffusion is positive recurrent. Under the assumption of blanket exponential ergodicity, it is also shown that the RVI algorithm converges globally to the solution of the multiplicative HJB equation from any positive initial condition. This paper revisits the problem without assuming blanket conditions, instead assuming a near-monotone running cost and a structural assumption relating the running cost function to the solution of the multiplicative HJB equation. It is shown that this structural assumption implies the existence of a control under which the ground state diffusion is exponentially ergodic, and a global convergence result of the multiplicative VI/RVI algorithms is established, extending the results in Arapostathis and Borkar (2020).

SYSTEMS & CONTROL LETTERS (2023)

Add to Collection

Article Mathematics, Applied

Solution of nonlinear boundary value problem by S-iteration

S. Thenmozhi, M. Marudai

Summary: In this paper, a novel approach was introduced to solve the non-linear fourth order boundary value problem using integral operator equation. Through three illustrations, it was demonstrated that S-iteration for contraction operator shows faster convergence than Krasnoselskii-Mann's iteration based on residual or absolute error calculations.

JOURNAL OF APPLIED MATHEMATICS AND COMPUTING (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Value-Based Subgoal Discovery and Path Planning for Reaching Long-Horizon Goals

Shubham Pateria, Budhitama Subagdja, Ah-Hwee Tan, Chai Quek

Summary: This article proposes a novel subgoal graph-based planning method called LSGVP, which addresses the challenge of learning to reach long-horizon goals in spatial traversal tasks for autonomous agents. LSGVP uses a subgoal discovery heuristic based on cumulative reward and automatically prunes the learned subgoal graph to remove erroneous connections. It achieves higher cumulative positive rewards and goal-reaching success rates compared to other subgoal sampling or discovery heuristics.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Policy-Iteration-Based Finite-Horizon Approximate Dynamic Programming for Continuous-Time Nonlinear Optimal Control

Ziyu Lin, Jingliang Duan, Shengbo Eben Li, Haitong Ma, Jie Li, Jianyu Chen, Bo Cheng, Jun Ma

Summary: The research addresses the challenge of solving the finite-horizon HJB equation, proposes a new algorithm, and validates its effectiveness through simulations.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023)

Add to Collection

Article Mathematics, Applied

Shift-splitting fixed point iteration method for solving generalized absolute value equations

Xu Li, Yi-Xin Li, Yan Dou

Summary: In this paper, we propose a shift-splitting fixed point iteration method (FPI-SS) for solving large sparse generalized absolute value equations (GAVEs). Various convergence conditions of the FPI-SS method are presented. Numerical experiments demonstrate that the FPI-SS method outperforms other methods in terms of computing efficiency.

NUMERICAL ALGORITHMS (2023)

Add to Collection

Article Automation & Control Systems

Two-loop reinforcement learning algorithm for finite-horizon optimal control of continuous-time affine nonlinear systems

Zhe Chen, Wenqian Xue, Ning Li, Frank L. Lewis

Summary: This article introduces three novel time-varying policy iteration algorithms for finite-horizon optimal control problem of continuous-time affine nonlinear systems, including model-based and partially model-free methods, and provides analysis on the convergence, stability, and optimality of each algorithm.

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL (2022)

Add to Collection

Article Mathematics

Equivalence of Certain Iteration Processes Obtained by Two New Classes of Operators

Mujahid Abbas, Rizwan Anjum, Vasile Berinde

Summary: This paper aims to define two new classes of mappings, demonstrate the existence and iterative approximation of their fixed points, and show the equivalence of Ishikawa, Mann, and Krasnoselskij iteration methods for such mappings. Additionally, applications of these results to solve split feasibility and variational inequality problems are provided.

MATHEMATICS (2021)

Add to Collection

Article Mathematics, Applied

An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

Heng Zhang

Summary: This paper proposes a novel adaptive dynamic programming (ADP)-based model-free policy iteration (PI) algorithm to solve an infinite-horizon continuous-time linear quadratic stochastic (LQS) optimal control problem, which includes both control and state variables in the diffusion term of system dynamics. By using Ito's lemma and expectations, a relationship among the state trajectory, control input, and matrices to be solved is described. The ADP-based model-free algorithm is then developed to approximate the optimal control from collected data without requiring information about all system coefficient matrices. Convergence analysis is provided under mild conditions, and numerical examples demonstrate the effectiveness of the proposed algorithm.

JOURNAL OF APPLIED MATHEMATICS AND COMPUTING (2023)

Add to Collection

Article Computer Science, Information Systems

A periodic iterative learning scheme for finite-iteration tracking of discrete networks based on FlexRay communication protocol

Wenjun Xiong, Daniel W. C. Ho, Shifan Wen

Summary: This study analyzes the finite-iteration tracking of discrete networks by designing a new periodic ILC strategy using the FlexRay communication protocol. The approach reduces communication channel bandwidth load and improves performance. It provides a new method for iterative learning in network control design and shows better performance compared to traditional ILC schemes.

INFORMATION SCIENCES (2021)

Add to Collection

Article Mathematics, Applied

A new constraint preconditioner based on the PGSS iteration method for non-Hermitian generalized saddle point problems

Hongyu Wu, Shuhuang Xiang

Summary: A new constraint preconditioner is proposed for non-Hermitian generalized saddle point problems, constructed based on the PGSS iteration method. The invertibility condition and convergence properties of the new preconditioner are analyzed in detail, and its effectiveness is illustrated through numerical experiments.

APPLIED MATHEMATICS AND COMPUTATION (2021)

Add to Collection

Article Mathematics, Applied

A constructive method for parabolic equations with opposite orientations arising in optimal control

Stefania Ragni

Summary: An optimal control model governed by parabolic equations is analyzed through the formulation of the optimality system. The uniqueness of the solution is proven, and a constructive approximation method is provided. The convergence of iterative schemes and the use of exponential integrators in PDE-constrained optimization are investigated. Numerical results demonstrate the effectiveness of the proposed approach.

JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS (2022)

Add to Collection

No Data Available

No Data Available

© Peeref 2019-2024. All rights reserved.