Skip to content

Latest commit

 

History

History
170 lines (124 loc) · 7.57 KB

File metadata and controls

170 lines (124 loc) · 7.57 KB

AGENTS.md

Instructions for AI agents contributing to this codebase.


Project overview

llmfit is a Rust CLI/TUI tool that matches LLM models against local system hardware (RAM, CPU, GPU). It detects system specs, loads a model database from embedded JSON, scores each model's fit, and presents results in an interactive terminal UI or classic table output.

Language and toolchain

  • Rust, edition 2024.
  • Build with cargo build. Run with cargo run.
  • No nightly features required. Stable toolchain only.
  • Minimum supported Rust version: whatever edition 2024 requires (1.85+).

Architecture

main.rs          Entrypoint. Parses CLI args via clap. Launches TUI by default,
                 falls back to CLI subcommands (system, list, fit, search, info)
                 or --cli flag for classic table output.

hardware.rs      SystemSpecs::detect() reads RAM/CPU via sysinfo crate.
                 detect_gpu() shells out to nvidia-smi / rocm-smi, and
                 detects Apple Silicon via system_profiler.
                 On unified memory (Apple Silicon), VRAM = system RAM.
                 No async. No unsafe.

models.rs        LlmModel struct. ModelDatabase loads from data/hf_models.json
                 embedded via include_str!() at compile time. No runtime file I/O.

fit.rs           FitLevel enum (Perfect, Good, Marginal, TooTight).
                 RunMode enum (Gpu, CpuOffload, CpuOnly).
                 ModelFit::analyze() compares a model against SystemSpecs,
                 selecting the best available execution path (GPU > CPU offload > CPU).
                 rank_models_by_fit() sorts by fit level, then run mode, then utilization.

display.rs       CLI-mode table rendering using the tabled crate.
                 Only used when --cli flag or subcommands are invoked.

tui_app.rs       TUI application state. Holds all models, filters (search text,
                 provider toggles, fit filter), selection index.
                 All filtering logic is here -- apply_filters() recomputes
                 filtered_fits indices whenever inputs change.

tui_ui.rs        Rendering with ratatui. Four layout regions: system bar,
                 search/filter bar, model table (or detail pane), status bar.
                 Stateless rendering -- reads from App, writes to Frame.

tui_events.rs    Keyboard event handling with crossterm. Two modes: Normal
                 (navigation, filter toggling, quit) and Search (text input).

Data flow

  1. App::new() calls SystemSpecs::detect() and ModelDatabase::new().
  2. Every model is analyzed into a ModelFit via ModelFit::analyze().
  3. Results are sorted by rank_models_by_fit().
  4. apply_filters() produces filtered_fits: Vec<usize> (indices into all_fits).
  5. The TUI render loop reads App state and draws via tui_ui::draw().
  6. tui_events::handle_events() mutates App state, triggering re-render.

Model database

  • Source: data/hf_models.json (33 models).
  • Generated by scripts/scrape_hf_models.py (Python, stdlib only, no pip deps).
  • Embedded at compile time via include_str!("../data/hf_models.json").
  • Schema per entry: name, provider, parameter_count, min_ram_gb, recommended_ram_gb, min_vram_gb, quantization, context_length, use_case.
  • min_vram_gb is VRAM needed for GPU inference. min_ram_gb is system RAM needed for CPU inference. Both are derived from the same parameter count.
  • RAM formula: params * 0.5 bytes (Q4_K_M) / 1024^3 * 1.2 overhead.
  • VRAM formula: params * 0.5 bytes (Q4_K_M) / 1024^3 * 1.1 activation overhead.
  • Recommended RAM: model_size * 2.0.

Do not manually edit hf_models.json. Regenerate it by running the scraper:

python3 scripts/scrape_hf_models.py

The scraper has hardcoded fallback entries for gated models that require authentication.

Conventions

  • No unsafe code.
  • No .unwrap() on user-facing paths. Use proper error handling or expect() with a descriptive message for internal invariants only.
  • Fit levels are ordered: Perfect > Good > Marginal > TooTight. Do not add levels without updating rank_models_by_fit() sort logic.
  • Fit is VRAM-first. GPU inference with sufficient VRAM is the ideal path. CPU inference via system RAM is a fallback. The RunMode enum tracks which memory pool is being used (Gpu, CpuOffload, CpuOnly).
  • min_vram_gb is the VRAM needed to load model weights on GPU. min_ram_gb is the system RAM needed for CPU-only inference (same weights, loaded into RAM instead). They represent the same workload on different hardware paths.
  • On Apple Silicon (unified memory), VRAM = system RAM. The CpuOffload path is skipped because there is no separate RAM pool to spill to. SystemSpecs::unified_memory tracks this.
  • TUI rendering is stateless. tui_ui::draw() must not mutate App. Pass &mut App only for TableState widget requirements -- do not use it to change application state.
  • Event handling in tui_events.rs is the sole place that mutates App in the TUI loop.
  • Keep display.rs and tui_*.rs independent. The CLI path must work without initializing any TUI state.

Adding a new model to the database

  1. Add the model's HuggingFace repo ID to TARGET_MODELS in scripts/scrape_hf_models.py.
  2. If the model is gated (requires HF auth), add a fallback entry to the FALLBACK dict in the same script.
  3. Run python3 scripts/scrape_hf_models.py.
  4. Verify the output in data/hf_models.json.
  5. Run cargo build to verify compilation.

Adding a new filter

  1. Add the filter state to App in tui_app.rs.
  2. Add filtering logic inside apply_filters().
  3. Add the keybinding in tui_events.rs (Normal mode handler).
  4. Add the UI widget in tui_ui.rs (draw_search_and_filters() function).
  5. Update the status bar help text in draw_status_bar().

Adding a new CLI subcommand

  1. Add a variant to the Commands enum in main.rs.
  2. Add the match arm in the main() function's command dispatch.
  3. Use display.rs functions for output, or add new ones as needed.

Testing

There are no tests yet. When adding tests:

  • Unit tests for fit.rs logic (given known SystemSpecs and LlmModel values, assert correct FitLevel).
  • Unit tests for models.rs (verify JSON parsing, search matching).
  • Integration tests for CLI subcommands via assert_cmd crate.
  • TUI is difficult to unit test. Keep rendering stateless and test the state mutations in tui_app.rs directly.

Dependencies policy

  • Prefer crates that are well-maintained and have minimal transitive dependencies.
  • sysinfo is the system detection crate. Do not replace it with raw platform calls.
  • ratatui + crossterm is the TUI stack. Do not mix in termion or ncurses.
  • clap with derive feature for CLI parsing. Do not use manual arg parsing.
  • The Python scraper uses only stdlib (urllib, json). Do not add pip dependencies.

Common tasks

# Build
cargo build

# Run TUI
cargo run

# Run CLI mode
cargo run -- --cli

# Run specific subcommand
cargo run -- system
cargo run -- fit --perfect -n 5
cargo run -- search "llama"

# Refresh model database
python3 scripts/scrape_hf_models.py && cargo build

# Check for compilation issues
cargo check

# Format code
cargo fmt

# Lint
cargo clippy

Platform notes

  • GPU detection shells out to nvidia-smi (NVIDIA) and rocm-smi (AMD). These are best-effort and fail silently if unavailable.
  • Apple Silicon detection uses system_profiler SPDisplaysDataType. On unified memory Macs, VRAM is reported as available system RAM (same pool).
  • sysinfo handles cross-platform RAM/CPU. No conditional compilation needed.
  • The TUI uses crossterm which works on Linux, macOS, and Windows terminals.