4.7 Article

A deep reinforcement learning framework for continuous intraday market bidding

Journal

MACHINE LEARNING
Volume 110, Issue 9, Pages 2335-2387

Publisher

SPRINGER
DOI: 10.1007/s10994-021-06020-8

Keywords

European continuous intraday markets; Energy storage control; Markov decision process; Deep reinforcement learning; Asynchronous fitted Q iteration

Ask authors/readers for more resources

The paper proposes a novel modeling framework for the strategic participation of energy storage in the European continuous intraday market, aiming to maximize profits over the entire trading horizon while considering operational constraints. The Markov Decision Process is used to model the decision-making problem, with an asynchronous version of the fitted Q iteration algorithm chosen for solving it efficiently. Historical data is utilized for generating artificial trajectories to address exploration issues during the learning process, with the resulting policy back-tested against benchmark strategies to evaluate the impact of storage characteristics on total revenues.
The large integration of variable energy resources is expected to shift a large part of the energy exchanges closer to real-time, where more accurate forecasts are available. In this context, the short-term electricity markets and in particular the intraday market are considered a suitable trading floor for these exchanges to occur. A key component for the successful renewable energy sources integration is the usage of energy storage. In this paper, we propose a novel modelling framework for the strategic participation of energy storage in the European continuous intraday market where exchanges occur through a centralized order book. The goal of the storage device operator is the maximization of the profits received over the entire trading horizon, while taking into account the operational constraints of the unit. The sequential decision-making problem of trading in the intraday market is modelled as a Markov Decision Process. An asynchronous version of the fitted Q iteration algorithm is chosen for solving this problem due to its sample efficiency. The large and variable number of the existing orders in the order book motivates the use of high-level actions and an alternative state representation. Historical data are used for the generation of a large number of artificial trajectories in order to address exploration issues during the learning process. The resulting policy is back-tested and compared against a number of benchmark strategies. Finally, the impact of the storage characteristics on the total revenues collected in the intraday market is evaluated.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available