May 31, 2021 to June 4, 2021
Europe/Berlin timezone

Robust porous media flow control using Deep Reinforcement Learning

Jun 4, 2021, 11:10 AM
Oral Presentation (MS15) Machine Learning and Big Data in Porous Media MS15


Atish Dixit (PhD student)


With the recent progress in reinforcement learning (RL) research, we investigate whether it would be suitable to use RL in solving optimal well control problem with uncertain reservoir models. In principle, RL algorithms are capable of learning optimal action policies — a map from states to actions — to maximize a numerical reward signal. In the RL formulation of porous media flow control problems, we represent the state with snapshots of subsurface flow simulation; the action with valve openings controlling flow through sources/sinks (i.e., injection/production) wells while the numerical reward refers to the total sweep efficiency. Optimal control policies are learned by numerous episodes of simulation trials (referred to as agent-environment interactions in the RL literature).

The major challenge in learning an optimal flow control policy for well control is that the reservoir simulation often comprises of uncertain parameters (e.g., permeability fields). To the best of our knowledge, so far, such policies are learned by simply incurring samples of parameter uncertainty distribution in each episode of agent-environment interactions. Such a policy learning process is often very unstable. furthermore, it requires a very high number of episodes, such that the variety of parameter uncertainty domain is thoroughly explored. This is computationally quite intensive for porous media flow problems for subsurface reservoir. Therefore, we investigate if we can learn the robust optimal policy with just few samples of uncertainty distribution in order to cope with these limitations.

We present two test cases representing two distinct permeability uncertainty distributions as a proof of concept for our study. Policy based model-free RL algorithms like PPO (proximal policy optimization) and A2C (advantage actor-critic) are employed to solve the robust optimal control problem for both test cases. The results are benchmarked with the optimization results obtained using differential evolution algorithm.

Time Block Preference Time Block B (14:00-17:00 CET)
Student Poster Award Yes, I would like to enter this submission into the student poster award
Acceptance of Terms and Conditions Click here to agree
Newsletter I do not want to receive the InterPore newsletter

Primary authors

Atish Dixit (PhD student) Ahmed H. Elsheikh (Heriot-Watt University)

Presentation materials

There are no materials yet.