Rishabh Agarwal – Google Brain

I work on reinforcement learning and reasoning in the LLama team at Meta, based out of Montreal. I am also an Adjunct Professor at McGill University. Previously, I was a staff research scientist in the Google DeepMind Team . I finished my PhD at Mila under the guidance of Aaron Courville and Marc Bellemare. Previously, I spent a year at Geoffrey Hinton's amazing team in Google Brain, Toronto. Earlier, I graduated in Computer Science and Engineering from IIT Bombay.

My current research revolves around RL and LLMs, and my prior work has received an outstanding paper award at NeurIPS.

Current PhD Students

Morgane Moss (Co-supervised with Aaron Courville)

Past Mentees & Student Researchers

Max Schwarzer (BBF, Now o1 @ OpenAI)
Yongchao Zhou (DistillSpec, Now @ x.AI)
Arian Hosseini (V-STaR, PhD @ Mila)
Jesse Farebrother (Stop Regressing, PhD @ McGill)
Lunjun Zhang (Generative RMs, PhD @ UofT)
Charline Le Lan (RL Generalization, Now Gemini Flash @ GDM)
Michael Noukhovitch ( Asynchronous RL for LLMs, PhD @ Mila)
Wenda Xu (Speculative KD , PhD @ UCSB)
Hritik Bansal (Compute-Optimal STaR / KD / W2S , PhD @ UCLA)
Josh P Zitovsky (Offline Model Selection, Now @ Amazon)
Amrith Setlur (Advantage for PRMs , PhD @ UC Berkeley)
Ghada Sokar (Dormant Neurons, Now @ GDM)
Siddhant Agarwal (Undergrad Researcher, Now PhD @ UT Austin )

News

Talk at CVPR 2025 on The Bitter Lesson for RL: Verification as the Key to Reasoning LLMs .
Tutorial on Post-Training Distillation of LLMs at Google. [Podcast @ Youtube]
7 papers accepted at ICLR 2025, including Generative Verifiers , SCoRE , Speculative KD, Async RLHF, and Inference-aware RL for LLMs .
Panelist on the Inference Time LLM Algorithms Tutorial at NeurIPS 2024.
Gave a guest lecture at McGill about RL, Reasoning, and Verifiers.
Many-Shot ICL and Scalable Oversight accepted at NeurIPS 2024