Samuel DerOeko

Hi, I'm Samuel.

I build systems that autonomously find failure modes in Large Language Models using Reinforcement Learning.

Currently, I am working at KachmanLab on an automated jailbreaking framework. My work focuses on the intersection of theoretical alignment research and high-performance engineering.

Current Focus

Reinforcement Learning: Implementing verifiable reward frameworks (using GRPO and Verifiers) to train adversarial agents.
Inference Optimization: Designing asynchronous generation pipelines using asyncio and vLLM to maximize throughput across multi-GPU environments.
Evaluation: Curating adversarial datasets and benchmarking model robustness using AgentDojo and custom environments.

Selected Work

KachmanLab (Current): End-to-end RL training for automated jailbreaking.
Prime Intellect: Developed RL environments for decentralized training (e.g., Gutenberg literary analysis).
RoboLearn: Built Bayesian models for depression (controllability) and data analysis pipelines for computational psychiatry in collaboration with Prof. Dr. Roshan Cools (Donders Institute/RadboudUMC).

Interests

Adversarial Robustness • Model Evals • Alignment Faking • Model Organisms • Mechanistic Interpretability • Steganography • Jailbreaking • Multi-Agent Systems.

Wanna talk? Book a 1-on-1 or do it the old fashioned way: samuelgerrit.nellessen{at}gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Samuel DerOeko

Achievements

Achievements

Highlights

Block or report DerOeko

Hi, I'm Samuel.

Current Focus

Selected Work

Interests

Pinned Loading

Uh oh!