4.7 Article

On the Feasibility Guarantees of Deep Reinforcement Learning Solutions for Distribution System Operation

Journal

IEEE TRANSACTIONS ON SMART GRID
Volume 14, Issue 2, Pages 954-964

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TSG.2022.3233709

Keywords

Diamonds; Power systems; Optimization; Training; Systems operation; Reinforcement learning; Real-time systems; Distribution systems; deep reinforcement learning; feasibility guarantee; verification; energy storage

Ask authors/readers for more resources

This paper guarantees the feasibility of solutions given by a DRL agent trained to operate a distribution system by modifying the exploration process and optimality criterion. A convex feasibility set, called feasibility diamond, is formed inside the power flow constraints region, and solutions outside the diamond are projected on its surface for DRL training. The impact of this method on the feasibility and optimality of DRL solutions are tested on three test distribution systems, showing near-optimal and reliable operators can be achieved.
Deep reinforcement learning (DRL) has scored unprecedented success in finding near-optimal solutions in high-dimensional stochastic problems, leading to its extensive use in operational research, including the operation of power systems. However, in practice, it has been adopted with extreme caution because the standard DRL does not guarantee the satisfaction of operational constraints. In this paper, the feasibility of solutions given by a DRL agent trained to operate a distribution system is guaranteed by modifying the exploration process and optimality criterion of standard DRL. To that end, first, a convex feasibility set in the form of a multi-dimensional polyhedron, called feasibility diamond, is formed inside the region defined by the power flow constraints, using which the feasibility of solutions given by DRL is checked in real-time. Solutions outside the feasibility diamond are projected on the diamond's surface, and the new modified action is used for DRL training. Further, the distance of infeasible solutions to their feasible projection is penalized in the DRL reward function. The impact of the proposed method on the feasibility and optimality of DRL solutions are tested on three test distribution systems, indicating that modifying the exploration process and a soft penalization of infeasibilities works best in achieving near-optimal and reliable DRL-trained operators.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available