Log in

Imperial users details

Other users details

No account? details

Information on

Finding a talk details

Adding a talk details

Syndicating talks details

Who we are details

Everything else details

A unified view of entropy-regularized Markov decision processes

Add to your list(s) Download to your calendar using vCal

Gergely Neu, Pompeu Fabra University
Tuesday 31 October 2017, 11:00-12:00
Dennis Gabor Seminar Room, 611, level 6, EEE Dept. .

If you have a question about this talk, please contact Joan P O'Brien.

We propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs). Our approach is based on extending the linear-programming formulation of policy optimization in MDPs to accommodate convex regularization functions. Our key result is showing that using the conditional entropy of the joint state-action distributions as regularization yields a dual optimization problem closely resembling the Bellman optimality equations. This result enables us to formalize a number of state-of-the-art entropy-regularized reinforcement learning algorithms as approximate variants of Mirror Descent or Dual Averaging, and thus to argue about the convergence properties of these methods. In particular, we show that the exact version of the TRPO algorithm of Schulman et al. (2015) actually converges to the optimal policy, while the entropy-regularized policy gradient methods of Mnih et al. (2016) may fail to converge to a fixed point. Finally, we illustrate empirically the effects of using various regularization techniques on learning performance in a simple reinforcement learning setup.

Bio: Gergely Neu is a postdoctoral researcher at the Pompeu Fabra University, Barcelona, Spain. He has previously worked with the SequeL team of INRIA Lille, France and the RLAI group at the University of Alberta, Edmonton, Canada. He obtained his PhD degree in 2013 from the Technical University of Budapest, where his advisors were Andras Gyorgy, Csaba Szepesvari and Laszlo Gyorfi. His main research interests are in machine learning theory, including reinforcement learning and online learning with limited feedback and/or very large action sets.

This talk is part of the Featured talks series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

A unified view of entropy-regularized Markov decision processes

This talk is included in these lists:

Other lists

Other talks