Solving pure exploration tasks in bandit models and beyond

Mardi 25 février 2025 de 14h0 à 15h30

Amphi Ircica – 50 avenue Halley – Haute Borne – Villeneuve d’Ascq

 

Abstract:

Bandit models are well-studied in the machine learning community due to numerous applications to online content optimization. In a multi-armed bandit model, an agent sequentially tries different actions, called arms. Each arm is associated with an unknown probability distribution. In a pure exploration task, the agent wants to learn something about these distributions (e.g., which arms has the largest expectation) by querying as few samples as possible from them. I will first present a generic algorithm, called Track and Stop, that attains the minimal sample complexity in an asymptotic regime in which the error probability is small. The high computational cost of Track and Stop lead to the development of other types of algorithms that are computationally more appealing while keeping a (near)-optimal sample complexity. In particular, I will advocate the use of the so-called Top Two algorithms. Finally, I will present some results about pure exploration in Markov Decision processes in which the agent’s state can evolve with the chosen action and the goal is to learn a good policy (mapping from state to action).

Short Bio

She is a CNRS researcher in the CRIStAL at Université de Lille. She is also a member of the Inria team Scool. She is interested in statistics and machine learning, with a particular focus on sequential learning.

She was awarded the CNRS Bronze Medal. This award recognizes “the first work of researchers who are specialists in their field”.

 

Les commentaires sont fermés.