Welcome to the Pokemon Red AI developer guide! This document provides comprehensive information about the codebase architecture, design patterns, and development practices to help you contribute effectively to the project.
- Project Overview
- Architecture Overview
- Core Components
- Design Patterns
- Data Flow
- Testing Strategy
- Development Setup
- Contributing Guidelines
- API Reference
Pokemon Red AI is the research codebase backing a 3-paper cascade (EWRL 2026 → NeurIPS 2026 workshop → TMLR) on observation representations in long-horizon, sparse-reward reinforcement learning. It pairs a Gymnasium-compatible environment over PyBoy with a training pipeline (RecurrentPPO + W&B + alerts), a statistical analysis layer (rliable bootstrap CIs), and live monitoring dashboards (Streamlit).
The codebase is also fully usable as a generic Pokémon Red RL toolkit; most of the research apparatus is opt-in.
- Modularity: Each component has a single responsibility and well-defined interfaces
- Flexibility: Multiple reward strategies, observation types, and training algorithms
- Robustness: Comprehensive error handling and fallback mechanisms
- Testability: Extensive unit and integration tests (833 passing)
- Reproducibility: Pre-registered analysis plan, locked event-flag set, deterministic eval
- Extensibility: Easy to add new features without breaking existing functionality
The project follows a layered architecture pattern:
graph TB
subgraph "User Interface Layer"
CLI[CLI Commands]
API[Python API]
end
subgraph "Training Layer"
Trainer[PokemonTrainer]
Callbacks[Training Callbacks]
Models[Model Creation]
end
subgraph "Environment Layer"
GymEnv[PokemonRedGymEnv]
Rewards[Reward Calculators]
Observations[Observation Processing]
end
subgraph "Game Interface Layer"
Agent[PokemonRedAgent]
Controls[Input Controls]
Memory[Memory Reading]
end
subgraph "External Dependencies"
PyBoy[PyBoy Emulator]
SB3[Stable-Baselines3]
ROM[Pokemon Red ROM]
end
CLI --> Trainer
API --> Trainer
API --> GymEnv
Trainer --> GymEnv
Trainer --> Callbacks
Trainer --> Models
GymEnv --> Agent
GymEnv --> Rewards
GymEnv --> Observations
Agent --> Controls
Agent --> Memory
Agent --> PyBoy
PyBoy --> ROM
Trainer --> SB3
pokemon_red_ai/ # Python package: library + CLI
├── analysis/ # Treatment comparison + plotting
│ ├── comparison.py # rliable IQM, learning curves, plots
│ └── __init__.py
├── cli/ # Click-based CLI (`pokemon-ai`)
│ ├── commands.py
│ └── __init__.py
├── environment/ # RL environment components
│ ├── gym_env.py # Gymnasium env wrapper
│ ├── rewards.py # Reward calculation strategies
│ ├── observations.py # 6 observation treatments
│ └── __init__.py
├── game/ # PyBoy interface layer
│ ├── agent.py # PokemonRedAgent (emulator wrapper)
│ ├── controls.py # Input + screen-state detection
│ ├── memory.py # RAM address constants + readers
│ ├── event_flags.py # 18 pre-registered event flags
│ └── __init__.py
├── training/ # Training infrastructure
│ ├── trainer.py # PokemonTrainer orchestration
│ ├── callbacks.py # SB3 callbacks (W&B, monitoring)
│ ├── alerts.py # Desktop / Slack / email alerts
│ ├── models.py # Model + feature extractor factories
│ └── __init__.py
├── utils/ # Configuration + file helpers
└── __init__.py
scripts/ # Top-level entry points (not in package)
├── train.py # Primary training entry (used by run_pilots.sh)
├── eval.py # Deterministic evaluation harness
├── analyze.py # rliable analysis → publication figures
├── compare.py # Streamlit treatment comparison
├── monitor.py # Streamlit single-run dashboard
├── run_pilots.sh # Launch the canonical 9-pilot grid
├── mirror_paper_to_overleaf.sh # Mirror paper/ → Overleaf
└── create_save_states.py # One-time save state generation
paper/ # Research artifacts
├── analysis_plan.md # Pre-registered hypotheses + protocol
├── compute_ledger.md # Per-run compute log
├── main.tex + sections/ # EWRL paper LaTeX source
├── references.bib
├── Makefile
└── figures/ # Output of scripts/analyze.py
bin/setup-overleaf-project.sh # One-shot Overleaf provisioning
configs/ # Sample alert + report configs
docs/research_playbook.md # Operational runbook
tests/ # 833 unit + integration tests
The lowest layer that directly interfaces with the PyBoy emulator.
The main game interface class that manages PyBoy emulation and game automation.
Key Responsibilities:
- PyBoy initialization and lifecycle management
- Game automation (intro sequence, opening setup)
- Button input handling with compatibility layers
- Screen and game state reading
- Episode management for RL training
Key Methods:
class PokemonRedAgent:
def __init__(self, rom_path: str, show_window: bool = True)
def press_button(self, button: str, hold_frames: int = 10)
def get_comprehensive_state(self) -> Dict[str, Any]
def run_opening_sequence(self) -> bool
def reset_game(self) -> bool
def step(self, action: str) -> Dict[str, Any]Handles Game Boy input controls and screen type detection.
Key Features:
- Multi-version PyBoy compatibility (1.x and 2.x)
- Screen type detection using memory analysis and tilemap inspection
- Smart button pressing with fallback mechanisms
- Automated navigation helpers
Screen Detection Logic:
flowchart TD
A[Check Memory State] --> B{map_id != 0?}
B -->|Yes| C[IN_GAME]
B -->|No| D{menu_state in 1,2,3?}
D -->|Yes| E[MAIN_MENU]
D -->|No| F[Analyze Tilemap]
F --> G{Sprite Density}
G -->|>100| H[INTRO_ANIMATION]
G -->|20-100| I{Top Area Content?}
I -->|Yes| J[TITLE_SCREEN]
I -->|No| K[INTRO_ANIMATION]
G -->|<20| L[UNKNOWN]
Defines Pokemon Red memory addresses and provides safe memory reading utilities.
Key Features:
- Well-documented memory address constants
- Safe memory reading with error handling
- 8-bit and 16-bit value reading
- Game state calculation utilities
Provides the RL environment interface following OpenAI Gymnasium standards.
The main RL environment that wraps the game agent.
Key Features:
- Standard Gymnasium interface (
step,reset,render) - Multi-modal observations (screen, position, stats, exploration)
- Configurable reward strategies
- Episode management and termination logic
Environment Lifecycle:
sequenceDiagram
participant RL as RL Algorithm
participant Env as PokemonRedGymEnv
participant Agent as PokemonRedAgent
participant PyBoy as PyBoy Emulator
RL->>Env: reset()
Env->>Agent: reset_game()
Agent->>PyBoy: Create new instance
Agent->>Agent: run_opening_sequence()
Agent-->>Env: Game ready
Env->>Env: _get_observation()
Env-->>RL: initial_observation, info
loop Training Loop
RL->>Env: step(action)
Env->>Agent: step(action)
Agent->>PyBoy: press_button(action)
Agent->>Agent: update tracking
Agent-->>Env: game_state
Env->>Env: _calculate_reward()
Env->>Env: _check_done()
Env-->>RL: obs, reward, done, info
end
Flexible reward calculation system with multiple strategies.
Reward Strategies:
- Standard: Balanced approach for general gameplay
- Exploration: Heavy emphasis on map discovery
- Progress: Focused on story advancement and badges
- Sparse: Only major achievements (advanced RL algorithms)
Reward Calculator Pattern:
class BaseRewardCalculator(ABC):
@abstractmethod
def calculate_reward(self, current_state: Dict[str, Any]) -> float:
pass
def reset(self) -> None:
passHandles game state conversion to RL-compatible observations.
Observation Types:
- Multi-modal: Screen + position + stats + exploration data
- Screen-only: Only visual information
- Minimal: Compact feature vector for fast training
High-level training orchestration and monitoring.
Main training orchestration class that coordinates all components.
Training Flow:
flowchart TD
A[Initialize Trainer] --> B[Create Environment]
B --> C[Create RL Model]
C --> D[Setup Callbacks]
D --> E[Start Training Loop]
E --> F{Training Complete?}
F -->|No| G[Model.learn]
G --> H[Save Checkpoints]
H --> E
F -->|Yes| I[Save Final Model]
I --> J[Cleanup Resources]
Training monitoring and control callbacks.
Callback Types:
- TrainingCallback: Basic progress tracking and model saving
- EnhancedTrainingCallback: Live plotting and detailed metrics
- EarlyStopping: Automatic training termination
- PerformanceMonitor: System resource monitoring
RL model creation and configuration utilities.
Features:
- Algorithm-specific hyperparameter defaults
- Custom feature extractors for multi-modal observations
- PyTorch neural network architectures
- Hyperparameter optimization support
Rich command-line interface using Click and Rich libraries.
Commands:
train: Train a new modeltest: Evaluate a trained modelinfo: Project information and analysisconfig: Configuration managementinit: Initialize new projectsdoctor: System health checks
The components below are layered on top of the core RL toolkit and power the paper-grade workflow. They're opt-in; none are required to train an agent.
analysis/comparison.py is the shared backend for treatment-level
statistical comparisons. Pure functions, no UI dependency:
from pokemon_red_ai.analysis.comparison import (
detect_treatment, group_runs_by_treatment,
learning_curves_with_bands, treatment_summary_table,
milestone_first_episode, final_performance,
plot_learning_curves, export_figure,
)treatment_summary_table(runs, n_boot=2000): IQM with 95% percentile-bootstrap CI per treatment. Pure-numpy implementation, mathematically equivalent torliable's stratified bootstrap but without that runtime dependency.learning_curves_with_bands(runs): per-treatment mean ± std curves, aligned to the shortest seed, smoothable.milestone_first_episode(runs): for each pre-registered event flag, the median (or min / mean) episode at which each treatment first triggered it.
scripts/analyze.py produces final paper figures using the same data
flow. See paper/analysis_plan.md §6 for
the locked statistical methodology.
| Callback | Role |
|---|---|
TrainingCallback |
Baseline checkpoint + log writer |
WandbCallback |
Streams scalars and tables to Weights & Biases |
MonitoringCallback |
Extends WandbCallback with map heatmaps, event-flag tracking, screen captures, and dashboard_state.json snapshot for the local Streamlit dashboard |
The MONITORED_INFO_KEYS tuple is the contract between the env's
info dict and the SB3 Monitor wrapper's info_keywords argument.
Adding a metric requires updating that tuple in lockstep with the env.
TrainingAlertCallback plus four pluggable channels (LogChannel,
DesktopChannel, SlackChannel, EmailChannel). Triggers cover
first badge / new max badge / new map / new event flag / reward
plateau / checkpoint / crash. Per-key cooldown prevents spam. YAML
config loader at configs/alerts.example.yaml.
from pokemon_red_ai.training.alerts import (
TrainingAlertCallback, DesktopChannel, SlackChannel,
load_alert_config, channels_from_config,
)
cfg = load_alert_config("configs/alerts.yaml")
callback = TrainingAlertCallback(channels=channels_from_config(cfg))scripts/monitor.py: single-run dashboard. Readsdashboard_state.json+ the SB3monitor.csv. Auto-refresh.scripts/compare.py: multi-run treatment comparison. Auto- groups by treatment from run names, renders learning curves, IQM tables, milestone race, with PDF/SVG/PNG export buttons for figures.
Both are pure-Streamlit; the data helpers underneath are unit-tested
in tests/unit/test_monitor_script.py and
tests/unit/analysis/test_comparison.py.
Wraps scripts/train.py for the canonical 9-pilot grid (3 treatments
× 3 seeds × 10M steps). Handles consistent save-dir / W&B-run-name
paths, caffeinate on macOS, batched parallelism (--parallel N),
skip-completed detection, and a final summary. See --help for all
options.
The paper LaTeX in paper/ is canonical. bin/setup-overleaf-project.sh
provisions the Overleaf side from a project ID + token in one
command; scripts/mirror_paper_to_overleaf.sh keeps git → Overleaf
in sync. Both depend on the
overleaf-mcp-server.
Used extensively for reward calculation and observation processing:
# Reward strategies
reward_calculator = create_reward_calculator("exploration")
# Observation strategies
env = PokemonRedGymEnv(observation_type="multi_modal")For creating models and components:
def create_model(algorithm: str, env: gym.Env, **kwargs):
creators = {
'PPO': create_ppo_model,
'A2C': create_a2c_model,
'DQN': create_dqn_model
}
return creators[algorithm](env, **kwargs)Training callbacks observe training progress:
class TrainingCallback(BaseCallback):
def _on_rollout_end(self) -> None:
# React to training events
if self.model.ep_info_buffer:
self.update_statistics()Base classes define workflow, subclasses implement specifics:
class BaseRewardCalculator(ABC):
def calculate_reward(self, state):
# Template method
self.reward_components.clear()
reward = self._calculate_base_reward(state)
reward += self._calculate_bonus_rewards(state)
return rewardPyBoy compatibility layer adapts different API versions:
def press_button_basic(pyboy, button: str):
try:
# PyBoy 2.x API
pyboy.button_press(button_id)
except AttributeError:
# PyBoy 1.x API fallback
pyboy.send_input(button_id)flowchart LR
subgraph "Game World"
ROM[Pokemon Red ROM]
PyBoy[PyBoy Emulator]
end
subgraph "Game Interface"
Agent[PokemonRedAgent]
Memory[Memory Reader]
Controls[Input Controls]
end
subgraph "RL Environment"
Env[PokemonRedGymEnv]
Rewards[Reward Calculator]
Obs[Observation Processor]
end
subgraph "Training"
Model[RL Model]
Trainer[PokemonTrainer]
Callbacks[Callbacks]
end
ROM --> PyBoy
PyBoy --> Agent
Agent --> Memory
Agent --> Controls
Agent --> Env
Env --> Rewards
Env --> Obs
Env --> Model
Model --> Trainer
Trainer --> Callbacks
Model --> |Action| Env
Env --> |State| Model
Rewards --> |Reward| Model
Obs --> |Observation| Model
- Raw Game State: PyBoy provides screen pixels and memory access
- Parsed State: Agent extracts position, stats, game flags
- RL Observation: Environment converts to ML-friendly format
- Reward Signal: Reward calculator provides learning signal
- Action: RL model outputs button press
- Game Response: PyBoy executes action and updates state
tests/
├── conftest.py # Global fixtures and configuration
├── unit/ # Unit tests for individual components
│ ├── agent/ # Game interface tests
│ │ ├── conftest.py # Agent-specific fixtures
│ │ ├── test_agent.py # PokemonRedAgent tests
│ │ ├── test_controls.py # Controls and input tests
│ │ └── test_memory.py # Memory reading tests
│ ├── environment/ # Environment layer tests
│ ├── training/ # Training components tests
│ └── utils/ # Utility function tests
└── integration/ # Integration and end-to-end tests
Heavy use of mocks for external dependencies:
@pytest.fixture
def mock_pyboy():
mock = Mock()
mock.memory = Mock()
mock.screen = Mock()
mock.button_press = Mock()
return mockReusable game state fixtures:
@pytest.fixture
def sample_memory_state():
return {
'player_x': 5,
'player_y': 7,
'map_id': 1,
'player_level': 15,
# ... more state
}Testing multiple scenarios efficiently:
@pytest.mark.parametrize("action", ['A', 'B', 'UP', 'DOWN'])
def test_step_with_all_actions(mock_agent, action):
state = mock_agent.step(action)
assert isinstance(state, dict)Benchmark critical paths:
def test_agent_step_performance(benchmark_runner):
def step_op():
agent.step('RIGHT')
result = benchmark_runner.run('step', step_op, iterations=100)
assert result['mean'] < 0.05 # Under 50ms- Python 3.8+
- Pokemon Red ROM file (.gb format)
- Git for version control
# Clone repository
git clone https://github.com/yourusername/pokemon-red-ai.git
cd pokemon-red-ai
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e .
# Install development dependencies
pip install pytest pytest-cov black isort mypy
# Run tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=pokemon_red_ai --cov-report=html
# Check code style
black pokemon_red_ai/
isort pokemon_red_ai/
mypy pokemon_red_ai/- Create Feature Branch:
git checkout -b feature/my-feature - Write Tests: Add tests for new functionality
- Implement Feature: Follow existing patterns and conventions
- Run Tests: Ensure all tests pass
- Check Style: Run linting and formatting tools
- Create PR: Submit for review
- Python Style: Follow PEP 8 with Black formatting
- Import Order: Use isort for consistent import organization
- Type Hints: Add type hints for all public functions
- Docstrings: Use Google-style docstrings
from typing import Dict, Any, Optional
import logging
logger = logging.getLogger(__name__)
class ExampleClass:
"""
Example class following project conventions.
Args:
param1: Description of parameter
param2: Optional parameter with default
"""
def __init__(self, param1: str, param2: Optional[int] = None):
self.param1 = param1
self.param2 = param2 or 42
def process_data(self, data: Dict[str, Any]) -> Dict[str, Any]:
"""
Process input data and return results.
Args:
data: Input data dictionary
Returns:
Processed data dictionary
Raises:
ValueError: If data is invalid
"""
if not data:
raise ValueError("Data cannot be empty")
try:
result = self._internal_processing(data)
logger.info(f"Processed {len(data)} items")
return result
except Exception as e:
logger.error(f"Processing failed: {e}")
raise
def _internal_processing(self, data: Dict[str, Any]) -> Dict[str, Any]:
"""Internal processing method (private)."""
return {k: v for k, v in data.items() if v is not None}# 1. Create new reward calculator in pokemon_red_ai/environment/rewards.py
class CustomRewardCalculator(BaseRewardCalculator):
def calculate_reward(self, current_state: Dict[str, Any]) -> float:
# Implement custom logic
return reward
# 2. Register in factory function
def create_reward_calculator(strategy: str, config=None):
calculators = {
'standard': StandardRewardCalculator,
'custom': CustomRewardCalculator, # Add here
}
# ...
# 3. Add tests in tests/unit/environment/test_rewards.py
class TestCustomRewardCalculator:
def test_custom_reward_calculation(self):
# Test implementation
pass# 1. Add processing function in pokemon_red_ai/environment/observations.py
def process_custom_observation(agent, episode_steps, max_steps) -> np.ndarray:
# Custom observation logic
return observation
# 2. Update environment in gym_env.py
def _get_observation(self):
if self.observation_type == "custom":
return process_custom_observation(...)
# ...
# 3. Add tests for new observation type# 1. Add model creation function in pokemon_red_ai/training/models.py
def create_custom_model(env, **kwargs):
# Model creation logic
return model
# 2. Register in factory
def create_model(algorithm: str, env, **kwargs):
creators = {
'PPO': create_ppo_model,
'CUSTOM': create_custom_model, # Add here
}
# ...- Use Appropriate Exceptions: Choose specific exception types
- Log Errors: Always log errors with context
- Graceful Degradation: Provide fallbacks when possible
- User-Friendly Messages: Clear error messages for CLI users
def example_function(param: str) -> str:
try:
result = risky_operation(param)
return result
except SpecificException as e:
logger.error(f"Specific error in example_function: {e}")
# Try fallback
return fallback_operation(param)
except Exception as e:
logger.error(f"Unexpected error in example_function: {e}")
raise RuntimeError(f"Failed to process {param}: {e}") from eagent = PokemonRedAgent("pokemon_red.gb", show_window=False)
agent.run_opening_sequence()
state = agent.step("RIGHT")
agent.reset_game()env = PokemonRedGymEnv("pokemon_red.gb", reward_strategy="exploration")
obs, info = env.reset()
obs, reward, done, truncated, info = env.step(action)trainer = PokemonTrainer("pokemon_red.gb", save_dir="./training/")
trainer.train(total_timesteps=100000, show_plots=True)
results = trainer.test("./training/models/best_model.zip")from pokemon_red_ai.utils import load_config, TrainingConfig
# Load from YAML
config = load_config("config.yaml")
# Create programmatically
config = TrainingConfig(
total_timesteps=100000,
algorithm="PPO",
reward_strategy="exploration"
)# The CLI commands use the same underlying components
# You can replicate CLI functionality programmatically:
from pokemon_red_ai import PokemonTrainer
from pokemon_red_ai.utils import create_directories, cleanup_rom_save_files
# Equivalent to: pokemon-ai train --rom game.gb --timesteps 50000
cleanup_rom_save_files("game.gb")
create_directories("./training/")
trainer = PokemonTrainer("game.gb", "./training/")
trainer.train(total_timesteps=50000)- PyBoy Compatibility: Check PyBoy version and use compatibility layers
- Memory Reading Errors: Always handle memory access failures gracefully
- Screen Detection: Use debug logging to understand screen state transitions
- Training Instability: Check reward signal and observation preprocessing
import logging
# Enable debug logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger('pokemon_red_ai')
logger.setLevel(logging.DEBUG)
# Use with debug info
env = PokemonRedGymEnv("game.gb", headless=False) # Show game window
trainer = PokemonTrainer("game.gb")
trainer.train(timesteps=1000, show_plots=True) # Short test run# Use enhanced callbacks for detailed monitoring
from pokemon_red_ai.training import EnhancedTrainingCallback
callback = EnhancedTrainingCallback(
show_plots=True,
save_freq=1000,
verbose=2 # Maximum verbosity
)
trainer.train(
timesteps=50000,
callback=callback
)This developer guide should give you a solid foundation for contributing to Pokemon Red AI. The modular architecture makes it easy to add new features while maintaining backward compatibility. Remember to:
- Follow the established patterns and conventions
- Write comprehensive tests for new functionality
- Document your code thoroughly
- Consider backward compatibility when making changes
- Use the existing error handling and logging patterns
For questions or discussions about the architecture, open an issue or start a discussion on GitHub.