Log in

Imperial users details

Other users details

No account? details

Information on

Finding a talk details

Adding a talk details

Syndicating talks details

Who we are details

Everything else details

London Probability Seminar: Multi-armed Bandits

Add to your list(s) Download to your calendar using vCal

Friday 31 October 2014, 09:00-17:00
EEE Department, Room 407A-407B.

If you have a question about this talk, please contact Patrick Kelly.

This seminar will consist of a two-hour introductory lecture on multi-armed bandits in the morning delivered by Nicolo Cesa-Bianchi (University of Milan) followed by three afternoon research talks on recent results by Aurelien Garivier (The Toulouse Mathematics Institute, CNRS Research Laboratory), Shie Mannor (Israel Institute of Technology) and Alexandre Proutiere (KTH Royal Institute of Technology).

Alexandre Proutiere KTH , The Royal Institute of Technology

Title: Structured Stochastic Bandits

Abstract: This talk surveys recent results for structured stochastic bandit problems. In these problems, the structure captures inherent dependencies between the rewards of the different arms, and should be optimally exploited to speed up the exploration process. We present problem-specific regret lower bounds for problems with various structures (unimodal, Lipschitz and combinatorial), and propose online algorithms that reach or approach these fundamental limits. The results are illustrated using applications ranging from online services to quantum computing.

Shie Mannor Technion, Israel Institute of Technology

Title: Model-free Algorithms for Misspecified and Complex Bandit Models

Abstract: The UCB algorithm and its many variant essentially build models for the reward distribution of the arms. These models typically build on some strict probabilistic assumptions. However, the actual reward does not always satisfy such assumptions. We consider two different algorithmic approaches for the multi-armed bandit problem. The first approach, known as Thompson sampling, employs a pseudo prior over the arms. At each run the algorithm samples from the pseudo prior, finds the optimal action for the sampled parameters, and then updates the pseudo prior accordingly. By “pretending” to be Bayesian, Thompson sampling can harness some heavy machinery from Bayesian inference such as particle filters. We show how Thompson sampling can be used for complex bandit problems where the decision maker has to select a complex action (such as a permeation or a subset) and receives rewards that only reveals some information on the arms values. The second approach considers the empirical rewards an agent observes rather than building a complete model. The algorithm boils down to resampling from the data and choosing the arm with the highest resampled reward. Using this approach, the model may be severely misspecified with little effect on the efficiency of the algorithm.

Aurélien Garivier Professeur, Institut de Mathématiques de Toulouse Directeur du CMI SID : Cursus Master en Ingénierie Statistique et Informatique Décisionnelle

Title: Bandits for Exploration: Best Arm Identification and Discovery with Probabilistic Experts

Abstract: Whereas the achievable limits in terms of regret minimization in simple bandit models are now well known, it is often meaningful to consider slightly different goals and/or slightly different models. In this talk, I first present recent contributions to a better understanding of the complexity of identifying the m-best arms. Generic notions of complexity are introduced and investigated for the two dominant frameworks considered in the literature: fixed-budget and fixed-confidence settings. Then, I present optimal discovery with probabilistic expert advice, a problem arising from security analysis of power systems. While not involving a bandit model, I will show how some basic ideas of the bandit literature still lead to optimal solutions.

Nicolò Cesa-Bianchi University of Milan

Abstract: We will cover the basic results for the simple bandit model with K arms in the stochastic and non stochastic case. We will also present some of the key results for the linear and convex Lipschitz case.

This talk is part of the Featured talks series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

London Probability Seminar: Multi-armed Bandits

This talk is included in these lists:

Other lists

Other talks