Home

Welcome to my professional webpage.

I hold the Chaire de Professeur Junior (Chair of Junior Professor, Competitive Research Chair funded by the French National Research Agency - ANR) at the University of Orléans, France. I am also the recipient of an ANR JCJC (Jeunes Chercheuses et Jeunes Chercheurs) grant for my project TAILOR on privacy-preserving reinforcement learning for healthcare applications.

Before my current position, I was a postdoctoral fellow in the Department of Mathematics and Computer Science at the Eindhoven University of Technology. I obtained my PhD in the topic of sequential learning with partial feedback at INRIA SequeL and Université de Lille under the supervision of Prof. Philippe Preux and Dr. Tanguy Urvoy. I received my master’s degree in Computer Science & Engineering from Indian Institute of Technology Madras under the supervison of Prof. Balaraman Ravindran.

My Erdös number is 3 thanks to Peter Auer (-> Pal Revesz -> Paul Erdös).

My research interests span across sequential decision-making and machine learning with ethical considerations such as fairness and privacy.

In sequential decision-making, I seek to understand how an intelligent agent “learns” from its interactions with the environment. Generally, I am interested in devising machine learning algorithms with strong mathematical foundations which can work with non-standard forms of feedback often encountered in real-life scenarios.

Secondly, I am also interested in interdisciplinary research with the aim of incorporating human aspects like fairness and privacy in machine learning.

News

[JCJC Grant Awarded] My project TAILOR has been awarded funding under the JCJC (Jeunes Chercheuses et Jeunes Chercheurs) programme of the French National Research Agency (ANR). TAILOR will develop new approaches for privacy-preserving reinforcement learning, focusing on adaptive (temporal) privacy guarantees that move beyond fixed privacy budgets to better balance patient-data protection and learning utility, with validation on healthcare datasets in collaboration with the University Hospital of Orléans (CHU Orléans). As part of this project, I will soon be recruiting a PhD student, ideally starting between October and December 2026. See the Open Positions page for details. (Jul 2026).
Athir Hamadieh has been selected for the CIFRE PhD position on Multi-Agent Reinforcement Learning for Improving Small Language Models, co-supervised with Lina María Rojas Barahona and Raphaël Féraud from Orange Labs. We are delighted to welcome her to the team, and she will join in October 2026. (Jul 2026).
A paper accepted at the Reinforcement Learning Conference 2026. In this joint work with S. Akash and Jawar Singh from the Indian Institute of Technology (Patna), we provide best-of-both-worlds (stochastic/adversarial) result for multi-dueling bandits, with matching lower bounds. See the pre-print at arxiv:2603.18972. (May 2026).
I will co-supervise a CIFRE PhD thesis with researchers from Orange Labs (Lina María Rojas Barahona and Raphaël Féraud) on the topic of Multi-Agent Reinforcement Learning (MRL) for improving small language models (LMs). The research aims to address the limitations of LMs in reasoning, planning, and grounding, with a focus on applying MRL to complex tasks such as Orange’s technical use-cases. (Apr 2026).
Haiyang Lu and I presented a poster at StatLearn 2026 on reinforcement learning under joint differential privacy, leveraging randomized value functions, exploration, and noise injection with provable regret guarantees. (Apr 2026).
I collaborated with S. Akash and Jawar Singh from the Indian Institute of Technology (Patna) on multi-dueling bandits, developing algorithms with formal regret guarantees. In Condorcet setting, our algorithm achieves $O(\sqrt{KT})$ pseudo-regret against adversarial preferences and the instance-optimal $O!\left(\sum_{i \neq a^\star} \frac{\log T}{\Delta_i}\right)$ pseudo-regret under stochastic preferences, both simultaneously and without prior knowledge of the regime. In Borda setting, our algorithm achieves $O\left(K^2 \log KT + K \log^2 T + \sum_{i: \Delta_i^{\mathrm{B}} > 0} \frac{K\log KT}{(\Delta_i^{\mathrm{B}})^2}\right)$ regret in stochastic environments and $O\left(K \sqrt{T \log KT} + K^{1/3} T^{2/3} (\log K)^{1/3}\right)$ regret in adversarial environments, again without prior knowledge of the regime. Lower bounds: We prove that the Condorcet guarantees are optimal and the Borda results are near-optimal (within a factor of K of the lower bound). These results provide the first best-of-both-worlds guarantees for multi-dueling bandits, applicable to ranking and recommendation systems.
This work is a continuation of my previous research on adversarial multi-dueling bandits. (Mar 2026).
My PhD student, Haiyang Lu, joined the team in February 2026 and will work on fairness-aware and privacy-preserving reinforcement learning, with validation data and guidance provided by Prof. Dr. Guillaume Beraud from CHU Orléans. (Feb 2026).
In collaboration with Apnolab, I submitted a proposal for the Appel à projets de recherche d’intérêt régional (APR IR 2026) to develop AI algorithms for personalized treatment of obstructive sleep apnea, using patient data collected through Apnolab’s platform to create adaptive “cognitive maps” that model how treatment parameters affect efficacy and patient comfort. The system aims to improve long-term adherence and could be extended to other chronic diseases requiring personalized care. (Dec 2025).
I worked on using causal discovery techniques to better understand clinical data, such as Alzheimer’s and heart failure records, and to evaluate how different methods impact fairness and decision-making in healthcare, in collaboration with Nitish Nagesh and other researchers at University of California, Irvine, providing guidance and mentorship. See the preprint at arxiv:2603.15926. (Nov 2025).
I co-organized the Workshop on Responsible Healthcare using Machine Learning (RHCML 2025) at ECML PKDD, which was a success with keynotes from Prof. John D. Piette and Prof. Paul Monsarrat, 10 accepted papers, and a panel discussion with Prof. Jaakko Peltonen and Prof. Josep Domingo-Ferrer. (Sep 2025).
[NEW PhD Position Available] I am seeking a PhD student for a fully funded position on “Fair and Privacy-Preserving Reinforcement Learning for Healthcare Applications” starting November 2025. The position is supported by the French National Research Agency (ANR) and open to international candidates. Application deadline: September 8, 2025. See full details on the Open Positions page. (July 2025).
[Workshop on Responsible Healthcare using Machine Learning (RHCML 2025)] I’m co-organizing the Workshop on Responsible Healthcare using Machine Learning (RHCML 2025) at ECML PKDD in Porto, Portugal, which will take place on September 19, 2025. Extended submission deadline: June 26, 2025. For more details, please see https://rhcml.github.io/. (June 2025).
As the lead organizer and presenter, I delivered a tutorial on advances in fairness-aware reinforcement learning, partnering with Mykola Pechenizkiy and Yingqian Zhang at the International Joint Conference on Artificial Intelligence (IJCAI) 2024. See additional information about the tutorial, including the slides at https://fair-rl.github.io/. The tutorial presented advances in fairness-aware reinforcement learning, encompassing both theoretical results and real-world applications. We covered motivating applications, fairness notions and the technical details of incorporating fairness objectives into reinforcement learning models, analyzed fairness in reinforcement learning as a multi-objective optimization problem, and explored impactful future directions.
A paper accepted at the ICML 2024 Workshop on Models of Human Feedback for AI Alignment. In this paper, we introduced the problem of regret minimization in adversarial multi-dueling bandits. Our work addresses a gap in the literature by considering scenarios where the learner selects multiple arms at each round and observes the identity of the most preferred arm, based on arbitrary preference matrices. We propose a novel algorithm and prove that its expected cumulative regret is upper-bounded by $O((K \log K)^{1/3} T^{2/3})$. We also establish a matching lower bound of $\Omega(K^{1/3} T^{2/3})$. See the paper at arxiv:2406.12475.
I gave a talk at the Conference on Advancing Behavioral Science through AI and Digital Health held in Ann Arbor, MI, USA. The topic was reinforcement learning-driven pain care recommendations. Here are the slides. (May 2024).
Mykola Pechenizkiy, Yingqian Zhang and I will be delivering a tutorial on the advances in fairness-aware reinforcement learning at the International Joint Conference on Artificial Intelligence (IJCAI) 2024. In this tutorial, we will present advances in fairness-aware reinforcement learning encompassing theoretical results as well as real-world applications. We plan to cover motivating applications and technical details of incorporating fairness objectives into the reinforcement learning model, analyze fairness in reinforcement learning as multi-objective optimization, and explore impactful future directions. See you all in Jeju! (Apr 2024).
I have been invited to give a talk at the Conference on Advancing Behavioral Science through AI and Digital Health. I will be speaking on reinforcement learning-driven healthcare recommendations. (Mar 2024).
A pre-print detailing some of our initial work in my ongoing collocation with Prof John D. Piette is online - arxiv:2402.19226. Here our focus is on identifying (and rectifying) gender bias in personalized recommendations for pain care. In this initial work, we show that if certain patient information, such as self-reported pain measurements, is not considered in the decision-making process, then the quality of reinforcement learning-driven pain care recommendations for women can be notably inferior to those for men. (Mar 2024).
A paper accepeted at the Symposium on Intelligent Data Analysis (IDA 2024). Under my guidance, Ronald C. van den Broek, Rik Litjens, Tobias Sagis, Luc Siecker, and Nina Verbeeke collaborated to produce this paper as a culmination of their course project (Jan 2024).
I have been awarded the UTQ (University Teaching Qualification) i.e. BKO (Basiskwalificatie Onderwijs) certificate. (Oct 2023).
I am going to co-deliver a tutorial on fair reinforcement learning at the 15th Asian Conference on Machine Learning (Sep 2023).
I presented our work about autonomous exploration in reinforcement learning at the 32nd International Joint Conference on Artificial Intelligence (IJCAI), 2023. See the slides for the presentation here (Aug 2023).
Two papers accepted at the European Workshop on Reinforcement Learning (EWRL), 2023. The first paper is on sparse-reward deep reinforcement learning with Jiong Li, who is one of my master’s students. The other is on multi-armed bandits where rewards arrive partially and they are observed with different delays. This paper is the result of a course project completed by Ronald C. van den Broek, Rik Litjens, Tobias Sagis, Luc Siecker and Nina Verbeeke. (Jul 2023).
Our joint work with Rosa van Tuijn, Tianqin Lu, Emma Driesse, Koen Franken and Dr. Emilia Barakova has been accepted at the 19th International Conference on Human-Computer Interaction (INTERACT), 2023. In this work, we propose a personalized explainable recommendation device for cardiac rehabilitees (May 2023).
Our joint work with Dr. Peter Auer and Dr. Ronad Ortner has been accepted at the 32nd International Joint Conference on Artificial Intelligence (IJCAI), 2023. In this work, we propose a meta-algorithm which can convert any RL algorithm with sublinear regret into an exploration algorithm with suitable guarantees on its sample complexity (Apr 2023).
Our joint work with Dennis Collaris, Joost Jorritsma, Mykola Pechenizkiy and Jack (Jarke) van Wijk has been awarded the Runner-up Frontier Prize at the 21st Symposium on Intelligent Data Analysis (IDA 2023). Click here to see the paper. More information about the work can be found at https://explaining.ml/lemon (Apr 2023).
Our joint work with Rosa van Tuijn, Tianqin Lu, Emma Driesse, Koen Franken and Dr. Emilia Barakova has been accepted as an extended abstract at the second International Conference on Hybrid Human-Artificial Intelligence (HHAI), 2023. In this work, we propose a personalized explainable recommendation device for cardiac rehabilitees. See the paper here (Apr 2023).
Recently, I have been working on curiosity-driven exploration in sparse-reward deep reinforcement learning with one of my master’s students. Here’s a preliminary version of the work – arxiv:2302.10825. In this work, we propose a method called I-Go-Explore that combines the intrinsic curiosity module with the Go-Explore framework to address some of the limitations of the state of the art. (Mar 2023).
Our paper titled – LEMON: Alternative Sampling for More Faithful Explanation through Local Surrogate Models, accepted at the Symposium on Intelligent Data Analysis (IDA), 2023 (Feb 2023).
My paper on providing local differential privacy for sequential decision making in a changing environment accepted at AAAI Privacy Preserving Artificial Intelligence (PPAI), 2023 (Jan 2023).
I am in the thesis committee for master’s thesis defense on the topic of multivariate distributional regression techniques at the Eindhoven University of Technology (Jan 2023).
I completed a pedagogical course on Designing Courses & Projects (Dec 2022).
I completed a pedagogical course on Teaching Skills (Dec 2022).
Our paper on posterior sampling for constrained reinforcement learning accepted at the Reinforcement Learning for Real Life Workshop at NeurIPS, 2022 (Dec 2022).
Our paper on batch-learning in stochastic linear bandits publised at International Conference on Data Mining (ICDM), 2022 (Dec 2022).
I completed a pedagogical course on Facilitating Learning (Nov 2022).
I am taking pedagogical courses to obtain University Teaching Qualification (UTQ/BKO) which is regarded as a proof of the competence of teaching in academic settings in the Netherlands (Oct 2022).
We are applying for a NWO grant (Open Technology Programme 2022). Watch this space for job announcements! (Oct 2022).
New paper about posterior sampling for constrained reinforcement learning (Sept 2022).
In the academic year 2022-2023, I am going to supervise 3 MSc students and a group of BSc students from the Honors Academy in addition to the 2 PhD students I am currently supervising (Sept 2022).
In the 1st quartile of the aceadmic year 2022-2023, I am teaching a course on reinforcement learning (Sept 2022).
In the 1st quartile of the aceadmic year 2022-2023, I am co-teaching a course on embodying intelligent behavior in social context.
New paper about batch learning in stochastic linear bandits to appear at ICDM 2022.
Working on an extensive survey (and a tutorial) on fairness-aware reinforcement learning. See the intitial version here.
In the 4th quartile of the aceadmic year 2021-2022, I am supervising and evaluating 10 student course projects on reinforcement learning as part of 2AMC15 Data Intelligence Challenge
I completed a course on supervision of PhD students.
In the 2nd quartile of the aceadmic year 2021-2022, I am going to a part of the assessment committee for bachelor projects in data science.

Pratik Gajane