Reinforcement Learning
- Core contributor of Reasoning Gym where I built dozens of RL environments, as well as ran the zero-shot, external benchmark, and curriculum learning experiments for our NeurIPS publication.
- Wrote several sections of the RLHF Book, where I derived the policy gradient and Bradley-Terry objectives, provided intuitions for the PPO gradient dynamics, and built the foundations of the code library.
Healthcare and Life Sciences
- Led a team to automate glomerular sclerosis classification from gigapixel kidney biopsies, deployed in a system serving over half of the Organ Procurement Organizations in the US.
- Part of a team developing models to predict protein-ligand binding affinity from DNA Encoded Library (DEL) data for drug discovery, resulting in numerous experimentally confirmed binders in the lab!
Continual Learning
- Worked on mitigating catastrophic forgetting in foundation models based on continual weight interpolation, demonstrating performance close to the upper bound of jointly training on all data in our NeurIPS workshop publication.
Model Evaluation
- Contributed several datasets to EleutherAI’s Evaluation Harness (e.g. Lambada Translations, Paloma, LegalBench), as well as implemented metric indicators and tests for output table consistency.
- Built Word Game Bench – an evaluation suite based on Wordle and Connections – which I ran on the new daily puzzles for several months in 2024 (sponsored by OpenRouter).
My work is used by AI labs such as DeepMind [1, 2, 3, 4], Meta [5, 6, 7], NVIDIA [8, 9], and Mila [10, 11, 12]:
- "Reasoning Gym: Reasoning Environments for RL with Verifiable Rewards." Zafir Stojanovski*, Oliver Stanley*, Joe Sharratt*, Richard Jones*, Abdulhakeem Adefioye, Jean Kaddour, Andreas Köpf. NeurIPS 2025 (Spotlight)
- "Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning." Zafir Stojanovski*, Karsten Roth*, Zeynep Akata. Interpolate Workshop @ NeurIPS 2022 (Best Paper Award)