Imperial College London > Talks@ee.imperial > Featured talks > Online Learning in Markov Decision Processes with Changing Reward Sequences

Online Learning in Markov Decision Processes with Changing Reward Sequences

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Patrick Kelly.

We consider online learning in finite stochastic Markovian environments where in each time step a new reward function is chosen by an oblivious adversary. The goal of the learning agent is to compete with the best stationary policy in hindsight in terms of the total reward received. Two variants of the problems are considered: the online stochastic shortest path problem and learning in unichain Markov decision processes. Several low-complexity algorithms are proposed, for both the full-information and the bandit-feedback setting, which achieve almost optimal performance under different assumptions on the Markov transition kernel. In the case of full-information feedback, our results complement existing results, while in the bandit feedback we give the first low-complexity algorithms achieving optimal performance.

Based on joint work with Travis Dick, Csaba Szepesvari, Gergely Neu, and Andras Antos.

Bio: András György received the M.Sc. (Eng.) degree (with distinction) in technical informatics from the Technical University of Budapest, in 1999, the M.Sc. (Eng.) degree in mathematics and engineering from Queen’s University, Kingston, ON, Canada, in 2001, and the Ph.D. degree in technical informatics from the Budapest University of Technology and Economics in 2003.

He was a Visiting Research Scholar in the Department of Electrical and Computer Engineering, University of California, San Diego, USA , in the spring of 1998. In 2002-2011 he was with the Computer and Automation Research Institute of the Hungarian Academy of Sciences, where, from 2006, he was a Senior Researcher and Head of the Machine Learning Research Group. In 2003-2004 he was also a NATO Science Fellow in the Department of Mathematics and Statistics, Queen’s University. He also held a part-time research position at GusGus Capital Llc., Budapest, Hungary, in 2006-2011. Since 2012 he has been with the Department of Computing Science, University of Alberta, Edmonton, AB, Canada. His research interests include machine learning, statistical learning theory, optimization, and information theory.

Dr. György received the Gyula Farkas prize of the János Bolyai Mathematical Society in 2001 and the Academic Golden Ring of the President of the Republic of Hungary in 2003.

This talk is part of the Featured talks series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

Changes to Talks@imperial | Privacy and Publicity