Pokémon Red: Reinforcement Learning Toolkit

Pokémon Red: Reinforcement Learning Toolkit

A Gymnasium-compatible environment and training pipeline for Pokémon Red, built on PyBoy, Stable-Baselines3, and sb3-contrib.

This repository provides everything needed to train RL agents to play Pokémon Red: an emulator-backed Gymnasium environment with three first-class observation treatments (pixel / symbolic / hybrid), RecurrentPPO training scripts, an event-flag-based reward calculator covering 18 critical-path milestones, live Streamlit monitoring dashboards, configurable alerts (desktop / Slack / email), and an analysis layer with bootstrap confidence intervals via rliable.

Repository at a glance

Path	What's there
`pokemon_red_ai/environment/`	Gymnasium env wrapping PyBoy + 3 observation treatments
`pokemon_red_ai/training/`	RecurrentPPO trainer, callbacks (W&B, alerts, monitoring)
`pokemon_red_ai/analysis/`	Treatment-comparison logic (`comparison.py`)
`scripts/train.py`	Primary training entry point
`scripts/eval.py`	Deterministic evaluation harness
`scripts/analyze.py`	rliable bootstrap analysis → publication-quality figures
`scripts/compare.py`	Streamlit dashboard for side-by-side run comparison
`scripts/monitor.py`	Streamlit dashboard for live single-run monitoring
`scripts/run_pilots.sh`	Launch a multi-treatment / multi-seed run grid
`docs/research_playbook.md`	Step-by-step operational guide for long-running experiments
`tests/`	833 unit + integration tests (pytest)

Quick start

# 1. Install
git clone https://github.com/amcheste/pokemon-red-ai.git
cd pokemon-red-ai
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# 2. Generate save states (one-time; requires a legal Pokémon Red ROM)
python3 scripts/create_save_states.py --rom path/to/PokemonRed.gb

# 3. Smoke test (~5 min, verify the pipeline end-to-end)
python3 scripts/train.py \
    --rom path/to/PokemonRed.gb \
    --save-state states/s0_post_intro.state \
    --observation-type pixel --total-timesteps 50000 --seed 42 \
    --save-dir ./training_output/smoketest

# 4. Run a multi-treatment / multi-seed grid (3 treatments × 3 seeds)
scripts/run_pilots.sh --rom path/to/PokemonRed.gb --parallel 3

# 5. Generate publication-quality figures
python3 scripts/analyze.py --results-dir ./training_output \
    --output-dir ./figures --format pdf --reps 10000

For unattended overnight runs, configure desktop / Slack / email alerts:

cp configs/alerts.example.yaml configs/alerts.yaml   # then enable channels

The full operational playbook (including compute estimates and a parallel-run strategy on Apple Silicon) is in docs/research_playbook.md.

Observation treatments

Three encoder paths, all feeding into the same LSTM (hidden size 256) and PPO policy / value heads (pi=[256,128], vf=[256,128]). Selected via --observation-type on scripts/train.py.

Treatment	Observation	Encoder	Params	Feature dim
`pixel`	80×72×1 grayscale Game Boy screen	NatureCNN (Mnih et al. 2015), `features_dim=256`	~564K	256
`symbolic`	Player position, party stats, 18-flag bit-vector, exploration counters (29 features)	3-layer MLP `29 → 640 → 640 → 256`	~594K	256
`hybrid`	`pixel` ∪ `symbolic` streams	NatureCNN(256) + symbolic MLP(256), concatenated	~1.16M	512

The pixel and symbolic encoders are sized to within 10% on trainable parameter count to neutralize the encoder-capacity confound when comparing modalities (Henderson et al. 2018; Engstrom et al. 2020; Andrychowicz et al. 2021). Strict per-forward FLOP matching across CNN and MLP architectures distorts encoder design and is reported transparently rather than enforced. Per-condition learning rates are selected from a pre-registered log-uniform grid following Eimer et al. (2023).

Run scripts/check_encoder_capacity.py to print the exact parameter / FLOP table and assert the 10% match constraint (exits non-zero on violation).

Implementation: pokemon_red_ai/training/models.py; observation construction in pokemon_red_ai/environment/observations.py.

The package also ships three legacy observation types (multi_modal, screen_only, minimal) for backward compatibility with earlier scripts.

Reward function

The default events reward strategy uses a configurable set of 18 event flags between Pallet Town and the Boulder Badge. Each flag transition 0 → 1 awards a fixed positive reward exactly once per episode. A small per-step time penalty, a new-map discovery bonus, and a party-faint penalty are also active by default.

Four other reward strategies are available (standard / exploration / progress / sparse); see pokemon_red_ai/environment/rewards.py for the full menu and configuration knobs.

The flag list with bit offsets is in pokemon_red_ai/game/event_flags.py.

Statistical analysis

Following Agarwal et al. 2021, Deep Reinforcement Learning at the Edge of the Statistical Precipice, the analysis tooling reports:

Point estimate: interquartile mean (IQM) over per-seed scores. Robust to outlier seeds in either tail.
Uncertainty: 95% percentile bootstrap with 2,000 resamples.
Pairwise comparison: probability of improvement, Pr[score_A > score_B] via stratified bootstrap.

Implemented in scripts/analyze.py (post-hoc figures) and pokemon_red_ai/analysis/comparison.py (reusable backend for the live Streamlit comparison and any notebook work).

Live monitoring

Tool	Use case
Weights & Biases (auto-enabled in `train.py`)	Cloud telemetry; per-treatment run grouping; check from any device
`streamlit run scripts/monitor.py`	Single-run live dashboard: reward curves, event flags, maps, level / party / money
`streamlit run scripts/compare.py`	Multi-run comparison: IQM table, learning-curve overlays with 95% bands, milestone race
`pokemon_red_ai.training.alerts`	Desktop / Slack / email alerts on first badge, reward plateau, training crash

Use as a library

The training pipeline is fully usable outside the bundled scripts:

from pokemon_red_ai.environment import PokemonRedGymEnv
from sb3_contrib import RecurrentPPO

env = PokemonRedGymEnv(
    rom_path="PokemonRed.gb",
    observation_type="hybrid",
    reward_strategy="events",
    max_episode_steps=15_000,
)
model = RecurrentPPO("MultiInputLstmPolicy", env, verbose=1)
model.learn(total_timesteps=1_000_000)

Custom reward strategies, observation types, and callback chains are documented in DEVELOPER_GUIDE.md.

Running the tests

./venv/bin/python3 -m pytest                  # full suite (~17s)
./venv/bin/python3 -m pytest tests/unit/      # unit only
./venv/bin/python3 -m pytest -k comparison    # specific module

Acknowledgments

Built on PyBoy (Game Boy emulation), Stable-Baselines3 and sb3-contrib (RL algorithms), Gymnasium (RL interface), and rliable (statistics). Memory addresses verified against the pret/pokered disassembly.

License & ROM

MIT, see LICENSE.

You must own a legal copy of the Pokémon Red ROM. This repository does not distribute, link to, or facilitate acquisition of any copyrighted game data.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github		.github
bin		bin
configs		configs
docs		docs
pokemon_red_ai		pokemon_red_ai
save_states		save_states
scripts		scripts
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOPER_GUIDE.md		DEVELOPER_GUIDE.md
LICENSE		LICENSE
MODEL_CARD.md		MODEL_CARD.md
README.md		README.md
agent.py		agent.py
pokemon_rl_trainer.py		pokemon_rl_trainer.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pokémon Red: Reinforcement Learning Toolkit

Repository at a glance

Quick start

Observation treatments

Reward function

Statistical analysis

Live monitoring

Use as a library

Running the tests

Acknowledgments

License & ROM

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pokémon Red: Reinforcement Learning Toolkit

Repository at a glance

Quick start

Observation treatments

Reward function

Statistical analysis

Live monitoring

Use as a library

Running the tests

Acknowledgments

License & ROM

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages