I work on reinforcement learning and reasoning in the LLama team at Meta, based out of Montreal. I am also an Adjunct Professor at McGill University. Previously, I was a staff research scientist in the Google DeepMind Team . I finished my PhD at Mila under the guidance of Aaron Courville and Marc Bellemare. Previously, I spent a year at Geoffrey Hinton's amazing team in Google Brain, Toronto. Earlier, I graduated in Computer Science and Engineering from IIT Bombay.
My current research revolves around RL and LLMs, and my prior work has received an outstanding paper award at NeurIPS.

Current PhD Students
- Morgane Moss (Co-supervised with Aaron Courville)
Past Mentees & Student Researchers
- Max Schwarzer (BBF, Now o1 @ OpenAI)
- Yongchao Zhou (DistillSpec, Now @ x.AI)
- Arian Hosseini (V-STaR, PhD @ Mila)
- Jesse Farebrother (Stop Regressing, PhD @ McGill)
- Lunjun Zhang (Generative RMs, PhD @ UofT)
- Charline Le Lan (RL Generalization, Now Gemini Flash @ GDM)
- Michael Noukhovitch ( Asynchronous RL for LLMs, PhD @ Mila)
- Wenda Xu (Speculative KD , PhD @ UCSB)
- Hritik Bansal (Compute-Optimal STaR / KD / W2S , PhD @ UCLA)
- Josh P Zitovsky (Offline Model Selection, Now @ Amazon)
- Amrith Setlur (Advantage for PRMs , PhD @ UC Berkeley)
- Ghada Sokar (Dormant Neurons, Now @ GDM)
- Siddhant Agarwal (Undergrad Researcher, Now PhD @ UT Austin )