Imperial College London > Talks@ee.imperial > Andras Gyorgy's list > Online Learning with Gaussian Payoffs and Side Observations

Online Learning with Gaussian Payoffs and Side Observations

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Andras Gyorgy.

In many learning problems the best results can only be achieved if the learning agent learns online, from experience while interacting with its environment. At the heart of designing agents for this problem is the so-called exploration-exploitation dilemma, the dilemma faced by a learning agent that has to weigh the benefit of receiving reward against the benefit of receiving information. While in some simple settings, like in the so-called bandit problems, during the last decades we reached a near-complete understanding of how to best resolve this dilemma, the question eludes us in even just slightly more complicated scenarios. In this talk, in addition to covering a bit of the core results known today, I will consider a variant of simple bandit problems: sequential learning with Gaussian payoffs and side observations with unknown means and known variances. In this problem, an agent wants to collect as much reward as possible by choosing actions from a fixed finite set in a sequential fashion. Pulling an action makes the agent incur a random reward, which the agent observes together with observations for all the other actions (that it did not choose). These side-observations share the same mean as the random reward the agent would receive if it chose those respective actions. By changing the relative magnitudes of the variances of the observations and their relation to the length of the learning horizon, this learning setting allows one to continuously interpolate between learning under full information when no exploration is necessary through hard partial monitoring where explicit exploration is necessary to (uninteresting) impossible problems. In this talk, I will describe what we know and what we don’t know about this learning setup. In particular, while doing this, as far as I know, for the first time in the literature, we established tight, non-asymptotic, problem-dependent lower bounds on the regret of any algorithm, the proof technique of which may be of interest on its own.

Based on joint work with Yifan Wu and Andras Gyorgy.

This talk is part of the Andras Gyorgy's list series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


Changes to Talks@imperial | Privacy and Publicity