Prometheus + Grafana observability stack for a remote NVIDIA Triton Inference Server. Designed for workshop attendees to flox pull and run on their laptops (macOS or Linux), monitoring a shared Triton instance over Tailscale.
# Pull from FloxHub and activate with services
flox pull barstoolbluz/triton-monitoring
cd triton-monitoring
flox activate -s
# Monitor a different Triton instance
TRITON_HOST=my-server.ts.net flox activate -s
# Reset all cached configs and data (clean slate)
RESET=1 flox activate -sThen open:
- Grafana: http://localhost:3000 (anonymous access enabled)
- Prometheus: http://localhost:9090
The flox/triton-metrics-stack package supports:
x86_64-linuxx86_64-darwin(Intel Mac)aarch64-darwin(Apple Silicon)
ARM Linux (aarch64-linux) is not supported.
Override any variable at activation time: VAR=value flox activate -s
| Variable | Default | Description |
|---|---|---|
TRITON_HOST |
steve-ethos.penguin-logarithm.ts.net |
Remote Triton server hostname (Tailscale) |
TRITON_METRICS_PORT |
8002 |
Triton Prometheus metrics port |
TRITON_HTTP_PORT |
8000 |
Triton HTTP port (used for health checks) |
RESET |
0 |
Set to 1 to wipe cached configs/data and regenerate from scratch |
PROMETHEUS_HOST |
0.0.0.0 |
Prometheus listen address |
PROMETHEUS_PORT |
9090 |
Prometheus listen port |
GF_SERVER_HTTP_ADDR |
0.0.0.0 |
Grafana listen address |
GF_SERVER_HTTP_PORT |
3000 |
Grafana listen port |
GF_SECURITY_ADMIN_PASSWORD |
admin |
Grafana admin password |
10-panel dashboard with a model template variable for filtering.
| Panel | Type | Metric(s) |
|---|---|---|
| Inference Throughput | timeseries | rate(nv_inference_count) |
| Request Success / Failure Rate | timeseries | nv_inference_request_success, nv_inference_request_failure |
| Average Request Latency | timeseries | nv_inference_request_duration_us / request count |
| Average Queue Time | timeseries | nv_inference_queue_duration_us / request count |
| Compute Time Breakdown | timeseries | nv_inference_compute_{input,infer,output}_duration_us (stacked) |
| Pending Requests | timeseries | nv_inference_pending_request_count |
| GPU Utilization | timeseries | nv_gpu_utilization (per GPU) |
| GPU Memory Usage | timeseries | nv_gpu_memory_used_bytes / nv_gpu_memory_total_bytes |
| GPU Power | timeseries | nv_gpu_power_usage vs nv_gpu_power_limit |
| Cumulative Inference Executions | timeseries | nv_inference_exec_count |
Note: The 3 GPU panels (Utilization, Memory, Power) require NVIDIA DCGM running on the Triton host. DCGM is not available on consumer GPUs (GeForce series) -- those panels will show "No data."
This environment installs prometheus, grafana, curl, jq, and the flox/triton-metrics-stack package. On activation:
- The hook sets deployment-specific defaults (
TRITON_HOST,TRITON_METRICS_PORT) and handlesRESETcache wipe triton-metrics-initis sourced -- sets remaining defaults, creates mutable dirs in$FLOX_ENV_CACHE, exports Grafana paths, generates Prometheus and Grafana provisioning configs, and checks Triton reachabilitytriton-metrics-prometheusstarts Prometheus with the generated configtriton-metrics-grafanastarts Grafana with the pre-built dashboard
Static assets (dashboard JSON, grafana.ini) live in the immutable Nix store via the package. Mutable state (Prometheus TSDB, Grafana data, generated configs) lives in $FLOX_ENV_CACHE -- the project directory stays clean.
Attendee Laptop (macOS/Linux) Remote Triton Host (Tailscale)
┌──────────────────────┐ ┌────────────────────────┐
│ │ │ Triton Inference Svr │
│ Prometheus │ Tailscale net │ :$TRITON_METRICS_PORT │
│ (scrapes metrics) │ ──────────────→ │ (/metrics) │
│ localhost:9090 │ └────────────────────────┘
│ │
│ Grafana │
│ (10-panel dashboard)│
│ localhost:3000 │
└──────────────────────┘
| Service | URL | Description |
|---|---|---|
| Prometheus | http://localhost:9090 | Metrics query UI |
| Grafana | http://localhost:3000 | Dashboard UI (anonymous access, no login) |
| Triton Metrics | http://<TRITON_HOST>:<TRITON_METRICS_PORT> | Remote Prometheus metrics endpoint |
- Grafana: Anonymous access is enabled with
Viewerrole -- no login required. Admin login isadmin/admin(set viaGF_SECURITY_ADMIN_PASSWORD).
version = 1
[install]
prometheus.pkg-path = "prometheus"
grafana.pkg-path = "grafana"
curl.pkg-path = "curl"
jq.pkg-path = "jq"
triton-monitoring.pkg-path = "flox/triton-metrics-stack"
triton-monitoring.systems = ["x86_64-linux", "aarch64-darwin", "x86_64-darwin"]
[hook]
on-activate = '''
export TRITON_HOST="${TRITON_HOST:-steve-ethos.penguin-logarithm.ts.net}"
export TRITON_METRICS_PORT="${TRITON_METRICS_PORT:-8002}"
# ... (RESET support, sources triton-metrics-init, prints banner)
. triton-metrics-init
'''
[services]
prometheus.command = "triton-metrics-prometheus"
grafana.command = "triton-metrics-grafana"See .flox/env/manifest.toml for the full hook.
triton-monitoring/
.flox/env/manifest.toml # Flox manifest (packages, hook, services)
README.md # This file
FLOX.md # Flox environment creation guide
.gitignore
# Static assets in Nix store (via flox/triton-metrics-stack package):
$FLOX_ENV/share/triton-metrics-stack/
grafana/
grafana.ini # Grafana server configuration
dashboards/
triton.json # 10-panel Triton dashboard
provisioning/ # Provisioning templates
prometheus.yml.template # Prometheus config template
# Generated at activation in $FLOX_ENV_CACHE:
$FLOX_ENV_CACHE/
config/
prometheus.yml # Prometheus scrape config (expanded)
grafana/
provisioning/
datasources/
prometheus.yaml # Prometheus datasource (expanded)
dashboards/
dashboard.yaml # File-based dashboard provider (expanded)
data/ # Grafana runtime data
log/ # Grafana logs
plugins/ # Grafana plugins
prometheus/data/ # Prometheus TSDB data
# Check Prometheus targets
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {scrapeUrl, health}'
# Check Triton metrics directly
curl -s http://$TRITON_HOST:$TRITON_METRICS_PORT/metrics | head -20
# Check Triton health (HTTP API)
curl -s http://$TRITON_HOST:$TRITON_HTTP_PORT/v2/health/ready
# View service logs
flox services logs prometheus
flox services logs grafana
# Restart services after config change
flox services restartNote: The troubleshooting commands above use
curlandjq, both included in this environment. If you don't need them, comment out their lines inmanifest.toml.