Imperial College London > Talks@ee.imperial > Featured talks > Large Structured Bandits and Applications

Large Structured Bandits and Applications

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Professor Peter Cheung.

In bandit optimization problems, a decision maker aims at sequentially selecting actions with (a priori) unknown average rewards so as to maximize her cumulative reward over some finite time horizon. Since the 30’s, these problems have been extensively used in many areas to model and investigate the trade-off between exploitation (selecting actions that gave high rewards in the past), and exploration (playing actions whose rewards may be higher in the future). Most of the literature on bandits assumes that rewards are independent across actions, and that the set of actions is limited. In this talk, we introduce a class of bandit problems (i) with very large action space (we cannot even sample all actions once within the time horizon), and (ii) with rewards that are correlated across actions. We explore possible research directions towards solutions of this novel class of sequential decision problems, and explain how these problems naturally arise in e-commerce systems (display ads, sponsored search auctions, …) and in the design of radio communication networks.

This talk is part of the Featured talks series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


Changes to Talks@imperial | Privacy and Publicity