I build systems that autonomously find failure modes in Large Language Models using Reinforcement Learning.
Currently, I am working at KachmanLab on an automated jailbreaking framework. My work focuses on the intersection of theoretical alignment research and high-performance engineering.
- Reinforcement Learning: Implementing verifiable reward frameworks (using
GRPOand Verifiers) to train adversarial agents. - Inference Optimization: Designing asynchronous generation pipelines using
asyncioandvLLMto maximize throughput across multi-GPU environments. - Evaluation: Curating adversarial datasets and benchmarking model robustness using
AgentDojoand custom environments.
- KachmanLab (Current): End-to-end RL training for automated jailbreaking.
- Prime Intellect: Developed RL environments for decentralized training (e.g., Gutenberg literary analysis).
- RoboLearn: Built Bayesian models for depression (controllability) and data analysis pipelines for computational psychiatry in collaboration with Prof. Dr. Roshan Cools (Donders Institute/RadboudUMC).
Adversarial Robustness • Model Evals • Alignment Faking • Model Organisms • Mechanistic Interpretability • Steganography • Jailbreaking • Multi-Agent Systems.
Wanna talk? Book a 1-on-1 or do it the old fashioned way: samuelgerrit.nellessen{at}gmail.com


