Pokemon Red AI - Developer Guide

Welcome to the Pokemon Red AI developer guide! This document provides comprehensive information about the codebase architecture, design patterns, and development practices to help you contribute effectively to the project.

Project Overview
Architecture Overview
Core Components
Design Patterns
Data Flow
Testing Strategy
Development Setup
Contributing Guidelines
API Reference

Project Overview

Pokemon Red AI is the research codebase backing a 3-paper cascade (EWRL 2026 → NeurIPS 2026 workshop → TMLR) on observation representations in long-horizon, sparse-reward reinforcement learning. It pairs a Gymnasium-compatible environment over PyBoy with a training pipeline (RecurrentPPO + W&B + alerts), a statistical analysis layer (rliable bootstrap CIs), and live monitoring dashboards (Streamlit).

The codebase is also fully usable as a generic Pokémon Red RL toolkit; most of the research apparatus is opt-in.

Key Design Principles

Modularity: Each component has a single responsibility and well-defined interfaces
Flexibility: Multiple reward strategies, observation types, and training algorithms
Robustness: Comprehensive error handling and fallback mechanisms
Testability: Extensive unit and integration tests (833 passing)
Reproducibility: Pre-registered analysis plan, locked event-flag set, deterministic eval
Extensibility: Easy to add new features without breaking existing functionality

Architecture Overview

The project follows a layered architecture pattern:

graph TB
    subgraph "User Interface Layer"
        CLI[CLI Commands]
        API[Python API]
    end
    
    subgraph "Training Layer"
        Trainer[PokemonTrainer]
        Callbacks[Training Callbacks]
        Models[Model Creation]
    end
    
    subgraph "Environment Layer"
        GymEnv[PokemonRedGymEnv]
        Rewards[Reward Calculators]
        Observations[Observation Processing]
    end
    
    subgraph "Game Interface Layer"
        Agent[PokemonRedAgent]
        Controls[Input Controls]
        Memory[Memory Reading]
    end
    
    subgraph "External Dependencies"
        PyBoy[PyBoy Emulator]
        SB3[Stable-Baselines3]
        ROM[Pokemon Red ROM]
    end
    
    CLI --> Trainer
    API --> Trainer
    API --> GymEnv
    
    Trainer --> GymEnv
    Trainer --> Callbacks
    Trainer --> Models
    
    GymEnv --> Agent
    GymEnv --> Rewards
    GymEnv --> Observations
    
    Agent --> Controls
    Agent --> Memory
    
    Agent --> PyBoy
    PyBoy --> ROM
    Trainer --> SB3

Directory Structure

pokemon_red_ai/                     # Python package: library + CLI
├── analysis/                       # Treatment comparison + plotting
│   ├── comparison.py               # rliable IQM, learning curves, plots
│   └── __init__.py
├── cli/                            # Click-based CLI (`pokemon-ai`)
│   ├── commands.py
│   └── __init__.py
├── environment/                    # RL environment components
│   ├── gym_env.py                  # Gymnasium env wrapper
│   ├── rewards.py                  # Reward calculation strategies
│   ├── observations.py             # 6 observation treatments
│   └── __init__.py
├── game/                           # PyBoy interface layer
│   ├── agent.py                    # PokemonRedAgent (emulator wrapper)
│   ├── controls.py                 # Input + screen-state detection
│   ├── memory.py                   # RAM address constants + readers
│   ├── event_flags.py              # 18 pre-registered event flags
│   └── __init__.py
├── training/                       # Training infrastructure
│   ├── trainer.py                  # PokemonTrainer orchestration
│   ├── callbacks.py                # SB3 callbacks (W&B, monitoring)
│   ├── alerts.py                   # Desktop / Slack / email alerts
│   ├── models.py                   # Model + feature extractor factories
│   └── __init__.py
├── utils/                          # Configuration + file helpers
└── __init__.py

scripts/                            # Top-level entry points (not in package)
├── train.py                        # Primary training entry (used by run_pilots.sh)
├── eval.py                         # Deterministic evaluation harness
├── analyze.py                      # rliable analysis → publication figures
├── compare.py                      # Streamlit treatment comparison
├── monitor.py                      # Streamlit single-run dashboard
├── run_pilots.sh                   # Launch the canonical 9-pilot grid
├── mirror_paper_to_overleaf.sh     # Mirror paper/ → Overleaf
└── create_save_states.py           # One-time save state generation

paper/                              # Research artifacts
├── analysis_plan.md                # Pre-registered hypotheses + protocol
├── compute_ledger.md               # Per-run compute log
├── main.tex + sections/            # EWRL paper LaTeX source
├── references.bib
├── Makefile
└── figures/                        # Output of scripts/analyze.py

bin/setup-overleaf-project.sh       # One-shot Overleaf provisioning
configs/                            # Sample alert + report configs
docs/research_playbook.md           # Operational runbook
tests/                              # 833 unit + integration tests

Core Components

1. Game Interface Layer (`pokemon_red_ai.game`)

The lowest layer that directly interfaces with the PyBoy emulator.

PokemonRedAgent (`agent.py`)

The main game interface class that manages PyBoy emulation and game automation.

Key Responsibilities:

PyBoy initialization and lifecycle management
Game automation (intro sequence, opening setup)
Button input handling with compatibility layers
Screen and game state reading
Episode management for RL training

Key Methods:

class PokemonRedAgent:
    def __init__(self, rom_path: str, show_window: bool = True)
    def press_button(self, button: str, hold_frames: int = 10)
    def get_comprehensive_state(self) -> Dict[str, Any]
    def run_opening_sequence(self) -> bool
    def reset_game(self) -> bool
    def step(self, action: str) -> Dict[str, Any]

Controls (`controls.py`)

Handles Game Boy input controls and screen type detection.

Key Features:

Multi-version PyBoy compatibility (1.x and 2.x)
Screen type detection using memory analysis and tilemap inspection
Smart button pressing with fallback mechanisms
Automated navigation helpers

Screen Detection Logic:

flowchart TD
    A[Check Memory State] --> B{map_id != 0?}
    B -->|Yes| C[IN_GAME]
    B -->|No| D{menu_state in 1,2,3?}
    D -->|Yes| E[MAIN_MENU]
    D -->|No| F[Analyze Tilemap]
    F --> G{Sprite Density}
    G -->|>100| H[INTRO_ANIMATION]
    G -->|20-100| I{Top Area Content?}
    I -->|Yes| J[TITLE_SCREEN]
    I -->|No| K[INTRO_ANIMATION]
    G -->|<20| L[UNKNOWN]

Memory (`memory.py`)

Defines Pokemon Red memory addresses and provides safe memory reading utilities.

Key Features:

Well-documented memory address constants
Safe memory reading with error handling
8-bit and 16-bit value reading
Game state calculation utilities

2. Environment Layer (`pokemon_red_ai.environment`)

Provides the RL environment interface following OpenAI Gymnasium standards.

PokemonRedGymEnv (`gym_env.py`)

The main RL environment that wraps the game agent.

Key Features:

Standard Gymnasium interface (step, reset, render)
Multi-modal observations (screen, position, stats, exploration)
Configurable reward strategies
Episode management and termination logic

Environment Lifecycle:

sequenceDiagram
    participant RL as RL Algorithm
    participant Env as PokemonRedGymEnv
    participant Agent as PokemonRedAgent
    participant PyBoy as PyBoy Emulator

    RL->>Env: reset()
    Env->>Agent: reset_game()
    Agent->>PyBoy: Create new instance
    Agent->>Agent: run_opening_sequence()
    Agent-->>Env: Game ready
    Env->>Env: _get_observation()
    Env-->>RL: initial_observation, info

    loop Training Loop
        RL->>Env: step(action)
        Env->>Agent: step(action)
        Agent->>PyBoy: press_button(action)
        Agent->>Agent: update tracking
        Agent-->>Env: game_state
        Env->>Env: _calculate_reward()
        Env->>Env: _check_done()
        Env-->>RL: obs, reward, done, info
    end

Reward System (`rewards.py`)

Flexible reward calculation system with multiple strategies.

Reward Strategies:

Standard: Balanced approach for general gameplay
Exploration: Heavy emphasis on map discovery
Progress: Focused on story advancement and badges
Sparse: Only major achievements (advanced RL algorithms)

Reward Calculator Pattern:

class BaseRewardCalculator(ABC):
    @abstractmethod
    def calculate_reward(self, current_state: Dict[str, Any]) -> float:
        pass
    
    def reset(self) -> None:
        pass

Observation Processing (`observations.py`)

Handles game state conversion to RL-compatible observations.

Observation Types:

Multi-modal: Screen + position + stats + exploration data
Screen-only: Only visual information
Minimal: Compact feature vector for fast training

3. Training Layer (`pokemon_red_ai.training`)

High-level training orchestration and monitoring.

PokemonTrainer (`trainer.py`)

Main training orchestration class that coordinates all components.

Training Flow:

flowchart TD
    A[Initialize Trainer] --> B[Create Environment]
    B --> C[Create RL Model]
    C --> D[Setup Callbacks]
    D --> E[Start Training Loop]
    E --> F{Training Complete?}
    F -->|No| G[Model.learn]
    G --> H[Save Checkpoints]
    H --> E
    F -->|Yes| I[Save Final Model]
    I --> J[Cleanup Resources]

Callbacks (`callbacks.py`)

Training monitoring and control callbacks.

Callback Types:

TrainingCallback: Basic progress tracking and model saving
EnhancedTrainingCallback: Live plotting and detailed metrics
EarlyStopping: Automatic training termination
PerformanceMonitor: System resource monitoring

Model Creation (`models.py`)

RL model creation and configuration utilities.

Features:

Algorithm-specific hyperparameter defaults
Custom feature extractors for multi-modal observations
PyTorch neural network architectures
Hyperparameter optimization support

4. User Interface Layer

CLI (`pokemon_red_ai.cli`)

Rich command-line interface using Click and Rich libraries.

Commands:

train: Train a new model
test: Evaluate a trained model
info: Project information and analysis
config: Configuration management
init: Initialize new projects
doctor: System health checks

Research Apparatus

The components below are layered on top of the core RL toolkit and power the paper-grade workflow. They're opt-in; none are required to train an agent.

Analysis layer (`pokemon_red_ai.analysis`)

analysis/comparison.py is the shared backend for treatment-level statistical comparisons. Pure functions, no UI dependency:

from pokemon_red_ai.analysis.comparison import (
    detect_treatment, group_runs_by_treatment,
    learning_curves_with_bands, treatment_summary_table,
    milestone_first_episode, final_performance,
    plot_learning_curves, export_figure,
)

treatment_summary_table(runs, n_boot=2000): IQM with 95% percentile-bootstrap CI per treatment. Pure-numpy implementation, mathematically equivalent to rliable's stratified bootstrap but without that runtime dependency.
learning_curves_with_bands(runs): per-treatment mean ± std curves, aligned to the shortest seed, smoothable.
milestone_first_episode(runs): for each pre-registered event flag, the median (or min / mean) episode at which each treatment first triggered it.

scripts/analyze.py produces final paper figures using the same data flow. See paper/analysis_plan.md §6 for the locked statistical methodology.

Live monitoring callbacks (`pokemon_red_ai.training.callbacks`)

Callback	Role
`TrainingCallback`	Baseline checkpoint + log writer
`WandbCallback`	Streams scalars and tables to Weights & Biases
`MonitoringCallback`	Extends `WandbCallback` with map heatmaps, event-flag tracking, screen captures, and `dashboard_state.json` snapshot for the local Streamlit dashboard

The MONITORED_INFO_KEYS tuple is the contract between the env's info dict and the SB3 Monitor wrapper's info_keywords argument. Adding a metric requires updating that tuple in lockstep with the env.

Alerting (`pokemon_red_ai.training.alerts`)

TrainingAlertCallback plus four pluggable channels (LogChannel, DesktopChannel, SlackChannel, EmailChannel). Triggers cover first badge / new max badge / new map / new event flag / reward plateau / checkpoint / crash. Per-key cooldown prevents spam. YAML config loader at configs/alerts.example.yaml.

from pokemon_red_ai.training.alerts import (
    TrainingAlertCallback, DesktopChannel, SlackChannel,
    load_alert_config, channels_from_config,
)
cfg = load_alert_config("configs/alerts.yaml")
callback = TrainingAlertCallback(channels=channels_from_config(cfg))

Streamlit dashboards

scripts/monitor.py: single-run dashboard. Reads dashboard_state.json + the SB3 monitor.csv. Auto-refresh.
scripts/compare.py: multi-run treatment comparison. Auto- groups by treatment from run names, renders learning curves, IQM tables, milestone race, with PDF/SVG/PNG export buttons for figures.

Both are pure-Streamlit; the data helpers underneath are unit-tested in tests/unit/test_monitor_script.py and tests/unit/analysis/test_comparison.py.

Pilot grid launcher (`scripts/run_pilots.sh`)

Wraps scripts/train.py for the canonical 9-pilot grid (3 treatments × 3 seeds × 10M steps). Handles consistent save-dir / W&B-run-name paths, caffeinate on macOS, batched parallelism (--parallel N), skip-completed detection, and a final summary. See --help for all options.

Overleaf integration

The paper LaTeX in paper/ is canonical. bin/setup-overleaf-project.sh provisions the Overleaf side from a project ID + token in one command; scripts/mirror_paper_to_overleaf.sh keeps git → Overleaf in sync. Both depend on the overleaf-mcp-server.

Design Patterns

1. Strategy Pattern

Used extensively for reward calculation and observation processing:

# Reward strategies
reward_calculator = create_reward_calculator("exploration")

# Observation strategies  
env = PokemonRedGymEnv(observation_type="multi_modal")

2. Factory Pattern

For creating models and components:

def create_model(algorithm: str, env: gym.Env, **kwargs):
    creators = {
        'PPO': create_ppo_model,
        'A2C': create_a2c_model,
        'DQN': create_dqn_model
    }
    return creators[algorithm](env, **kwargs)

3. Observer Pattern

Training callbacks observe training progress:

class TrainingCallback(BaseCallback):
    def _on_rollout_end(self) -> None:
        # React to training events
        if self.model.ep_info_buffer:
            self.update_statistics()

4. Template Method Pattern

Base classes define workflow, subclasses implement specifics:

class BaseRewardCalculator(ABC):
    def calculate_reward(self, state):
        # Template method
        self.reward_components.clear()
        reward = self._calculate_base_reward(state)
        reward += self._calculate_bonus_rewards(state)
        return reward

5. Adapter Pattern

PyBoy compatibility layer adapts different API versions:

def press_button_basic(pyboy, button: str):
    try:
        # PyBoy 2.x API
        pyboy.button_press(button_id)
    except AttributeError:
        # PyBoy 1.x API fallback
        pyboy.send_input(button_id)

Data Flow

Training Data Flow

flowchart LR
    subgraph "Game World"
        ROM[Pokemon Red ROM]
        PyBoy[PyBoy Emulator]
    end
    
    subgraph "Game Interface"
        Agent[PokemonRedAgent]
        Memory[Memory Reader]
        Controls[Input Controls]
    end
    
    subgraph "RL Environment"
        Env[PokemonRedGymEnv]
        Rewards[Reward Calculator]
        Obs[Observation Processor]
    end
    
    subgraph "Training"
        Model[RL Model]
        Trainer[PokemonTrainer]
        Callbacks[Callbacks]
    end
    
    ROM --> PyBoy
    PyBoy --> Agent
    Agent --> Memory
    Agent --> Controls
    
    Agent --> Env
    Env --> Rewards
    Env --> Obs
    
    Env --> Model
    Model --> Trainer
    Trainer --> Callbacks
    
    Model --> |Action| Env
    Env --> |State| Model
    Rewards --> |Reward| Model
    Obs --> |Observation| Model

State Information Flow

Raw Game State: PyBoy provides screen pixels and memory access
Parsed State: Agent extracts position, stats, game flags
RL Observation: Environment converts to ML-friendly format
Reward Signal: Reward calculator provides learning signal
Action: RL model outputs button press
Game Response: PyBoy executes action and updates state

Testing Strategy

Test Organization

tests/
├── conftest.py              # Global fixtures and configuration
├── unit/                    # Unit tests for individual components
│   ├── agent/              # Game interface tests
│   │   ├── conftest.py     # Agent-specific fixtures
│   │   ├── test_agent.py   # PokemonRedAgent tests
│   │   ├── test_controls.py # Controls and input tests
│   │   └── test_memory.py  # Memory reading tests
│   ├── environment/        # Environment layer tests
│   ├── training/           # Training components tests
│   └── utils/              # Utility function tests
└── integration/            # Integration and end-to-end tests

Testing Patterns

1. Mock-Based Testing

Heavy use of mocks for external dependencies:

@pytest.fixture
def mock_pyboy():
    mock = Mock()
    mock.memory = Mock()
    mock.screen = Mock()
    mock.button_press = Mock()
    return mock

2. Fixture-Based State

Reusable game state fixtures:

@pytest.fixture
def sample_memory_state():
    return {
        'player_x': 5,
        'player_y': 7,
        'map_id': 1,
        'player_level': 15,
        # ... more state
    }

3. Parameterized Testing

Testing multiple scenarios efficiently:

@pytest.mark.parametrize("action", ['A', 'B', 'UP', 'DOWN'])
def test_step_with_all_actions(mock_agent, action):
    state = mock_agent.step(action)
    assert isinstance(state, dict)

4. Performance Testing

Benchmark critical paths:

def test_agent_step_performance(benchmark_runner):
    def step_op():
        agent.step('RIGHT')
    
    result = benchmark_runner.run('step', step_op, iterations=100)
    assert result['mean'] < 0.05  # Under 50ms

Development Setup

Prerequisites

Python 3.8+
Pokemon Red ROM file (.gb format)
Git for version control

Local Development

# Clone repository
git clone https://github.com/yourusername/pokemon-red-ai.git
cd pokemon-red-ai

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -e .

# Install development dependencies
pip install pytest pytest-cov black isort mypy

# Run tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=pokemon_red_ai --cov-report=html

# Check code style
black pokemon_red_ai/
isort pokemon_red_ai/
mypy pokemon_red_ai/

Development Workflow

Create Feature Branch: git checkout -b feature/my-feature
Write Tests: Add tests for new functionality
Implement Feature: Follow existing patterns and conventions
Run Tests: Ensure all tests pass
Check Style: Run linting and formatting tools
Create PR: Submit for review

Contributing Guidelines

Code Style

Python Style: Follow PEP 8 with Black formatting
Import Order: Use isort for consistent import organization
Type Hints: Add type hints for all public functions
Docstrings: Use Google-style docstrings

Example Code Style

from typing import Dict, Any, Optional
import logging

logger = logging.getLogger(__name__)


class ExampleClass:
    """
    Example class following project conventions.
    
    Args:
        param1: Description of parameter
        param2: Optional parameter with default
    """
    
    def __init__(self, param1: str, param2: Optional[int] = None):
        self.param1 = param1
        self.param2 = param2 or 42
        
    def process_data(self, data: Dict[str, Any]) -> Dict[str, Any]:
        """
        Process input data and return results.
        
        Args:
            data: Input data dictionary
            
        Returns:
            Processed data dictionary
            
        Raises:
            ValueError: If data is invalid
        """
        if not data:
            raise ValueError("Data cannot be empty")
            
        try:
            result = self._internal_processing(data)
            logger.info(f"Processed {len(data)} items")
            return result
        except Exception as e:
            logger.error(f"Processing failed: {e}")
            raise
            
    def _internal_processing(self, data: Dict[str, Any]) -> Dict[str, Any]:
        """Internal processing method (private)."""
        return {k: v for k, v in data.items() if v is not None}

Adding New Features

1. New Reward Strategy

# 1. Create new reward calculator in pokemon_red_ai/environment/rewards.py
class CustomRewardCalculator(BaseRewardCalculator):
    def calculate_reward(self, current_state: Dict[str, Any]) -> float:
        # Implement custom logic
        return reward

# 2. Register in factory function
def create_reward_calculator(strategy: str, config=None):
    calculators = {
        'standard': StandardRewardCalculator,
        'custom': CustomRewardCalculator,  # Add here
    }
    # ...

# 3. Add tests in tests/unit/environment/test_rewards.py
class TestCustomRewardCalculator:
    def test_custom_reward_calculation(self):
        # Test implementation
        pass

2. New Observation Type

# 1. Add processing function in pokemon_red_ai/environment/observations.py
def process_custom_observation(agent, episode_steps, max_steps) -> np.ndarray:
    # Custom observation logic
    return observation

# 2. Update environment in gym_env.py
def _get_observation(self):
    if self.observation_type == "custom":
        return process_custom_observation(...)
    # ...

# 3. Add tests for new observation type

3. New Training Algorithm

# 1. Add model creation function in pokemon_red_ai/training/models.py
def create_custom_model(env, **kwargs):
    # Model creation logic
    return model

# 2. Register in factory
def create_model(algorithm: str, env, **kwargs):
    creators = {
        'PPO': create_ppo_model,
        'CUSTOM': create_custom_model,  # Add here
    }
    # ...

Error Handling Guidelines

Use Appropriate Exceptions: Choose specific exception types
Log Errors: Always log errors with context
Graceful Degradation: Provide fallbacks when possible
User-Friendly Messages: Clear error messages for CLI users

def example_function(param: str) -> str:
    try:
        result = risky_operation(param)
        return result
    except SpecificException as e:
        logger.error(f"Specific error in example_function: {e}")
        # Try fallback
        return fallback_operation(param)
    except Exception as e:
        logger.error(f"Unexpected error in example_function: {e}")
        raise RuntimeError(f"Failed to process {param}: {e}") from e

API Reference

Core Classes Quick Reference

PokemonRedAgent

agent = PokemonRedAgent("pokemon_red.gb", show_window=False)
agent.run_opening_sequence()
state = agent.step("RIGHT")
agent.reset_game()

PokemonRedGymEnv

env = PokemonRedGymEnv("pokemon_red.gb", reward_strategy="exploration")
obs, info = env.reset()
obs, reward, done, truncated, info = env.step(action)

PokemonTrainer

trainer = PokemonTrainer("pokemon_red.gb", save_dir="./training/")
trainer.train(total_timesteps=100000, show_plots=True)
results = trainer.test("./training/models/best_model.zip")

Configuration System

from pokemon_red_ai.utils import load_config, TrainingConfig

# Load from YAML
config = load_config("config.yaml")

# Create programmatically
config = TrainingConfig(
    total_timesteps=100000,
    algorithm="PPO",
    reward_strategy="exploration"
)

CLI Integration

# The CLI commands use the same underlying components
# You can replicate CLI functionality programmatically:

from pokemon_red_ai import PokemonTrainer
from pokemon_red_ai.utils import create_directories, cleanup_rom_save_files

# Equivalent to: pokemon-ai train --rom game.gb --timesteps 50000
cleanup_rom_save_files("game.gb")
create_directories("./training/")
trainer = PokemonTrainer("game.gb", "./training/")
trainer.train(total_timesteps=50000)

Debugging Tips

Common Issues

PyBoy Compatibility: Check PyBoy version and use compatibility layers
Memory Reading Errors: Always handle memory access failures gracefully
Screen Detection: Use debug logging to understand screen state transitions
Training Instability: Check reward signal and observation preprocessing

Debug Configuration

import logging

# Enable debug logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger('pokemon_red_ai')
logger.setLevel(logging.DEBUG)

# Use with debug info
env = PokemonRedGymEnv("game.gb", headless=False)  # Show game window
trainer = PokemonTrainer("game.gb")
trainer.train(timesteps=1000, show_plots=True)  # Short test run

Monitoring Training

# Use enhanced callbacks for detailed monitoring
from pokemon_red_ai.training import EnhancedTrainingCallback

callback = EnhancedTrainingCallback(
    show_plots=True,
    save_freq=1000,
    verbose=2  # Maximum verbosity
)

trainer.train(
    timesteps=50000,
    callback=callback
)

Conclusion

This developer guide should give you a solid foundation for contributing to Pokemon Red AI. The modular architecture makes it easy to add new features while maintaining backward compatibility. Remember to:

Follow the established patterns and conventions
Write comprehensive tests for new functionality
Document your code thoroughly
Consider backward compatibility when making changes
Use the existing error handling and logging patterns

For questions or discussions about the architecture, open an issue or start a discussion on GitHub.

FilesExpand file tree

DEVELOPER_GUIDE.md

Latest commit

History

DEVELOPER_GUIDE.md

File metadata and controls

Pokemon Red AI - Developer Guide

Table of Contents

Project Overview

Key Design Principles

Architecture Overview

Directory Structure

Core Components

1. Game Interface Layer (pokemon_red_ai.game)

PokemonRedAgent (agent.py)

Controls (controls.py)

Memory (memory.py)

2. Environment Layer (pokemon_red_ai.environment)

PokemonRedGymEnv (gym_env.py)

Reward System (rewards.py)

Observation Processing (observations.py)

3. Training Layer (pokemon_red_ai.training)

PokemonTrainer (trainer.py)

Callbacks (callbacks.py)

Model Creation (models.py)

4. User Interface Layer

CLI (pokemon_red_ai.cli)

Research Apparatus

Analysis layer (pokemon_red_ai.analysis)

Live monitoring callbacks (pokemon_red_ai.training.callbacks)

Alerting (pokemon_red_ai.training.alerts)

Streamlit dashboards

Pilot grid launcher (scripts/run_pilots.sh)

Overleaf integration

Design Patterns

1. Strategy Pattern

2. Factory Pattern

3. Observer Pattern

4. Template Method Pattern

5. Adapter Pattern

Data Flow

Training Data Flow

State Information Flow

Testing Strategy

Test Organization

Testing Patterns

1. Mock-Based Testing

2. Fixture-Based State

3. Parameterized Testing

4. Performance Testing

Development Setup

Prerequisites

Local Development

Development Workflow

Contributing Guidelines

Code Style

Example Code Style

Adding New Features

1. New Reward Strategy

2. New Observation Type

3. New Training Algorithm

Error Handling Guidelines

API Reference

Core Classes Quick Reference

PokemonRedAgent

PokemonRedGymEnv

PokemonTrainer

Configuration System

CLI Integration

Debugging Tips

Common Issues

Debug Configuration

Monitoring Training

Conclusion

1. Game Interface Layer (`pokemon_red_ai.game`)

PokemonRedAgent (`agent.py`)

Controls (`controls.py`)

Memory (`memory.py`)

2. Environment Layer (`pokemon_red_ai.environment`)

PokemonRedGymEnv (`gym_env.py`)

Reward System (`rewards.py`)

Observation Processing (`observations.py`)

3. Training Layer (`pokemon_red_ai.training`)

PokemonTrainer (`trainer.py`)

Callbacks (`callbacks.py`)

Model Creation (`models.py`)

CLI (`pokemon_red_ai.cli`)

Analysis layer (`pokemon_red_ai.analysis`)

Live monitoring callbacks (`pokemon_red_ai.training.callbacks`)

Alerting (`pokemon_red_ai.training.alerts`)

Pilot grid launcher (`scripts/run_pilots.sh`)