Skip to content

flox/triton-monitoring-runtime

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

triton-monitoring

Prometheus + Grafana observability stack for a remote NVIDIA Triton Inference Server. Designed for workshop attendees to flox pull and run on their laptops (macOS or Linux), monitoring a shared Triton instance over Tailscale.

Usage

# Pull from FloxHub and activate with services
flox pull barstoolbluz/triton-monitoring
cd triton-monitoring
flox activate -s

# Monitor a different Triton instance
TRITON_HOST=my-server.ts.net flox activate -s

# Reset all cached configs and data (clean slate)
RESET=1 flox activate -s

Then open:

Platform support

The flox/triton-metrics-stack package supports:

  • x86_64-linux
  • x86_64-darwin (Intel Mac)
  • aarch64-darwin (Apple Silicon)

ARM Linux (aarch64-linux) is not supported.

Environment variables

Override any variable at activation time: VAR=value flox activate -s

Variable Default Description
TRITON_HOST steve-ethos.penguin-logarithm.ts.net Remote Triton server hostname (Tailscale)
TRITON_METRICS_PORT 8002 Triton Prometheus metrics port
TRITON_HTTP_PORT 8000 Triton HTTP port (used for health checks)
RESET 0 Set to 1 to wipe cached configs/data and regenerate from scratch
PROMETHEUS_HOST 0.0.0.0 Prometheus listen address
PROMETHEUS_PORT 9090 Prometheus listen port
GF_SERVER_HTTP_ADDR 0.0.0.0 Grafana listen address
GF_SERVER_HTTP_PORT 3000 Grafana listen port
GF_SECURITY_ADMIN_PASSWORD admin Grafana admin password

Dashboard

10-panel dashboard with a model template variable for filtering.

Panel Type Metric(s)
Inference Throughput timeseries rate(nv_inference_count)
Request Success / Failure Rate timeseries nv_inference_request_success, nv_inference_request_failure
Average Request Latency timeseries nv_inference_request_duration_us / request count
Average Queue Time timeseries nv_inference_queue_duration_us / request count
Compute Time Breakdown timeseries nv_inference_compute_{input,infer,output}_duration_us (stacked)
Pending Requests timeseries nv_inference_pending_request_count
GPU Utilization timeseries nv_gpu_utilization (per GPU)
GPU Memory Usage timeseries nv_gpu_memory_used_bytes / nv_gpu_memory_total_bytes
GPU Power timeseries nv_gpu_power_usage vs nv_gpu_power_limit
Cumulative Inference Executions timeseries nv_inference_exec_count

Note: The 3 GPU panels (Utilization, Memory, Power) require NVIDIA DCGM running on the Triton host. DCGM is not available on consumer GPUs (GeForce series) -- those panels will show "No data."

How it works

This environment installs prometheus, grafana, curl, jq, and the flox/triton-metrics-stack package. On activation:

  1. The hook sets deployment-specific defaults (TRITON_HOST, TRITON_METRICS_PORT) and handles RESET cache wipe
  2. triton-metrics-init is sourced -- sets remaining defaults, creates mutable dirs in $FLOX_ENV_CACHE, exports Grafana paths, generates Prometheus and Grafana provisioning configs, and checks Triton reachability
  3. triton-metrics-prometheus starts Prometheus with the generated config
  4. triton-metrics-grafana starts Grafana with the pre-built dashboard

Static assets (dashboard JSON, grafana.ini) live in the immutable Nix store via the package. Mutable state (Prometheus TSDB, Grafana data, generated configs) lives in $FLOX_ENV_CACHE -- the project directory stays clean.

Architecture

Attendee Laptop (macOS/Linux)              Remote Triton Host (Tailscale)
┌──────────────────────┐                  ┌────────────────────────┐
│                      │                  │  Triton Inference Svr  │
│  Prometheus          │   Tailscale net  │  :$TRITON_METRICS_PORT │
│  (scrapes metrics)   │ ──────────────→  │  (/metrics)            │
│  localhost:9090      │                  └────────────────────────┘
│                      │
│  Grafana             │
│  (10-panel dashboard)│
│  localhost:3000      │
└──────────────────────┘

Service endpoints

Service URL Description
Prometheus http://localhost:9090 Metrics query UI
Grafana http://localhost:3000 Dashboard UI (anonymous access, no login)
Triton Metrics http://<TRITON_HOST>:<TRITON_METRICS_PORT> Remote Prometheus metrics endpoint

Authentication

  • Grafana: Anonymous access is enabled with Viewer role -- no login required. Admin login is admin / admin (set via GF_SECURITY_ADMIN_PASSWORD).

Manifest

version = 1

[install]
prometheus.pkg-path = "prometheus"
grafana.pkg-path = "grafana"
curl.pkg-path = "curl"
jq.pkg-path = "jq"
triton-monitoring.pkg-path = "flox/triton-metrics-stack"
triton-monitoring.systems = ["x86_64-linux", "aarch64-darwin", "x86_64-darwin"]

[hook]
on-activate = '''
  export TRITON_HOST="${TRITON_HOST:-steve-ethos.penguin-logarithm.ts.net}"
  export TRITON_METRICS_PORT="${TRITON_METRICS_PORT:-8002}"
  # ... (RESET support, sources triton-metrics-init, prints banner)
  . triton-metrics-init
'''

[services]
prometheus.command = "triton-metrics-prometheus"
grafana.command = "triton-metrics-grafana"

See .flox/env/manifest.toml for the full hook.

File layout

triton-monitoring/
  .flox/env/manifest.toml      # Flox manifest (packages, hook, services)
  README.md                     # This file
  FLOX.md                       # Flox environment creation guide
  .gitignore

  # Static assets in Nix store (via flox/triton-metrics-stack package):
  $FLOX_ENV/share/triton-metrics-stack/
    grafana/
      grafana.ini               # Grafana server configuration
      dashboards/
        triton.json             # 10-panel Triton dashboard
      provisioning/             # Provisioning templates
    prometheus.yml.template     # Prometheus config template

  # Generated at activation in $FLOX_ENV_CACHE:
  $FLOX_ENV_CACHE/
    config/
      prometheus.yml            # Prometheus scrape config (expanded)
    grafana/
      provisioning/
        datasources/
          prometheus.yaml       # Prometheus datasource (expanded)
        dashboards/
          dashboard.yaml        # File-based dashboard provider (expanded)
      data/                     # Grafana runtime data
        log/                    # Grafana logs
      plugins/                  # Grafana plugins
    prometheus/data/            # Prometheus TSDB data

Troubleshooting

# Check Prometheus targets
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {scrapeUrl, health}'

# Check Triton metrics directly
curl -s http://$TRITON_HOST:$TRITON_METRICS_PORT/metrics | head -20

# Check Triton health (HTTP API)
curl -s http://$TRITON_HOST:$TRITON_HTTP_PORT/v2/health/ready

# View service logs
flox services logs prometheus
flox services logs grafana

# Restart services after config change
flox services restart

Note: The troubleshooting commands above use curl and jq, both included in this environment. If you don't need them, comment out their lines in manifest.toml.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors