Skip to content

Conversation

@matt-dz
Copy link
Contributor

@matt-dz matt-dz commented Jan 12, 2026

What does this PR do?

Motivation

Describe how you validated your changes

Additional Notes

scottopell and others added 30 commits January 12, 2026 12:43
Document how to configure kubernetes-mcp-server for Claude Code
with a dedicated kubeconfig isolated to the gadget-dev cluster.

Includes setup, token renewal, and cleanup instructions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Clear guidance for AI agents on:
- Preferring kubernetes-mcp-server tools over kubectl
- When to use kubectl (apply -f, complex selectors)
- VM operations via limactl shell
- Common workflow examples

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
… collection

Complete implementation of core monitoring functionality:

- REQ-FM-001: Cgroup v2 filesystem scanning with multi-CRI pattern matching
  (containerd, CRI-O, Docker), pod UID extraction, QoS class detection
- REQ-FM-002: Memory metrics via lading's smaps_rollup (PSS) and cgroup v2 polling
- REQ-FM-003: CPU metrics via lading's cgroup_v2::cpu::Sampler with delta tracking

Also adds deployment infrastructure:
- Multi-stage Dockerfile with cargo-chef for fast rebuilds
- Kubernetes DaemonSet manifest
- inspect_metrics.py utility script

Progress: 4/5 requirements complete (REQ-FM-005 planned for later phase)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Local development runs on Apple Silicon - don't specify linux/amd64
in docker builds during testing loops.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…re API

Implement time-based file rotation for Parquet output files to ensure
each file has a valid footer and is immediately readable. Uses the new
rotation API from lading_capture.

Changes:
- Use channel-based rotation via RotationSender from CaptureManager
- Rotate files every 90 seconds (exceeds 60s accumulator window)
- Hive-style partitioning: dt=YYYY-MM-DD/identifier=<pod>/metrics-*.parquet
- Add global labels (node_name, cluster_name) to all metrics
- Write session manifest (session.json) on startup
- Track total bytes and enforce 1GiB limit with graceful shutdown
- Update specs (requirements, design, executive) for rotation feature
- Update daemonset.yaml to use :rotation image tag

The rotation ensures files are always readable without requiring graceful
shutdown, solving the Parquet footer problem for long-running captures.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Update specs to reflect the completed REQ-FM-004 implementation:

- Mark REQ-FM-004 as complete (was "In Progress")
- Update progress to 4 of 5 complete
- Document channel-based rotation API (start_with_rotation, RotationSender)
- Add verified metrics: 38,496 rows, 204 unique metrics, valid footers
- Update technical summary with new rotation flow
- Replace old CaptureManager recreation approach with channel-based design

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ation

Add utility scripts for ad-hoc analysis of collected Parquet metrics:

- merge_parquet.py: Combine multiple parquet files into one
- memory_timeline.py: Plot memory usage over time with Plotly
- cpu_analysis.py: Analyze CPU usage and throttling with rate deltas
- container_summary.py: Generate per-container statistics table
- pressure_analysis.py: Visualize PSI (Pressure Stall Information) metrics
- inspect_metrics.py: Fix crash on labels column with unhashable types

All scripts use uv inline dependencies for easy standalone execution.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
REQ-FM-005 was too future-facing to be useful in current planning.
The feature depended on Agent output interception which is not in scope.

Removed references from:
- requirements.md: Deleted requirement definition
- design.md: Removed implementation section and dependency reference
- executive.md: Updated status table (4/4 complete)
- CONTINUATION.md: Updated context

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Port the autocorrelation-based CPU oscillation detection algorithm from
the Go implementation (sopell/q-branch-cpu-oscillation-per-container)
to a standalone Python script that works on parquet data files.

Key features:
- Autocorrelation analysis to detect periodic CPU patterns
- Configurable thresholds (periodicity score, amplitude, period range)
- Per-container analysis with QoS class display
- Optional Plotly visualization with autocorrelation plots
- CLI interface matching existing script patterns

Algorithm matches detector.go:
- 60-sample analysis window (1Hz sampling = 60 seconds)
- Normalized autocorrelation for lags in [min_period, max_period]
- Detection when periodicity_score >= threshold AND amplitude >= min

Usage:
    uv run scripts/oscillation_detector.py metrics.parquet
    uv run scripts/oscillation_detector.py metrics.parquet --threshold 0.4 --plot

Reference: pkg/collector/corechecks/containers/cpu_oscillation/detector.go

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add column projection to only load needed columns (5 of ~15)
- Enable opt-level=3 in dev profile for full optimizations
- Fix borrow checker issue with projection mask construction

Processing 35M rows now takes ~11s vs ~100s unoptimized.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…Q-FM-005)

Add REQ-FM-005 to specs and implement metrics_viewer.py:

- Web-based interactive timeseries viewer using Dash/Plotly
- Container filtering via multi-select dropdown with quick-select buttons
- WebGL rendering (scattergl) for smooth performance with large datasets
- Pan/zoom interactions and range slider for time navigation
- Pre-computed CPU deltas at startup for responsive filtering
- Automatic browser launch when server starts

Implements: REQ-FM-005

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Pan/zoom into a region, then click "Rescale Y-Axis" to fit the
y-axis to the visible data range. Useful when initial data spikes
throw off the scale.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Rust version of metrics viewer with axum backend:
- Fast parquet loading (~10s for 35M rows vs Python's slower startup)
- axum HTTP server serving embedded Plotly.js frontend
- REST API: /api/containers and /api/timeseries endpoints
- Same UI features: container dropdown, rescale Y, reset zoom
- Single cargo-script file, opens browser automatically

Usage: ./scripts/metrics_viewer.rs metrics.parquet

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Remove buggy range slider (conflicts with scattergl)
- Use Plotly.react instead of newPlot for updates
- Use raw timestamps instead of Date objects
- Use Plotly.relayout for rescale/reset (faster)
- Read x-range from _fullLayout instead of manual tracking
- Add console.log for debugging

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Changed xaxis type from 'linear' to 'date' for human-readable timestamps
- Restored range slider with 5% thickness for time navigation
- Updated rescaleY to convert date strings back to milliseconds for comparison
- Increased chart height to 70vh to accommodate range slider

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Consolidated title, container selector, and buttons into single header row
- Moved legend to top-right (above chart) to avoid range slider overlap
- Increased range slider thickness slightly for better usability
- Changed hover mode to 'x unified' for easier comparison
- Reduced overall padding for more chart real estate
- Chart now uses calc(100vh - 90px) for maximum vertical space

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Added package.edition = "2024" to fix cargo warning
- Added timing for each loading step:
  - Read time (parsing parquet rows)
  - Process time (building container data structures)
  - Total load time

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Now shows three distinct phases:
1. Open file + read parquet metadata (disk I/O for footer, schema parsing)
2. ZSTD decompress + Arrow decode + CPU deltas (bulk data processing)
3. Build serving structs + sort by avg CPU (in-memory transformations)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Deleted scripts/metrics_viewer.py (replaced by Rust version)
- Updated executive.md with Rust/axum implementation details
- Updated design.md with new architecture:
  - Rust backend with axum HTTP server
  - Embedded Plotly.js frontend
  - REST API documentation
  - Performance characteristics (~10s for 35M rows)
- Updated file structure to reflect current scripts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Deleted:
- cpu_analysis.py (superseded by metrics_viewer.rs)
- pressure_analysis.py (unused)
- oscillation_detector.py (Rust version is equivalent)
- __pycache__/

Remaining scripts:
- metrics_viewer.rs - Interactive visualization
- oscillation_detector.rs - Pattern detection
- container_summary.py - Container overview
- inspect_metrics.py - Quick parquet inspection
- memory_timeline.py - Memory analysis
- merge_parquet.py - File merging utility

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
New Rust scripts:
- inspect_metrics.rs: Shows schema, row counts, unique metrics, time range, sample data
- container_summary.rs: Per-container stats (memory, CPU, throttling, PSI)
  - Supports --format table/csv/json
  - Supports --sort-by and --top N

Deleted Python versions:
- inspect_metrics.py
- container_summary.py

Remaining scripts:
- metrics_viewer.rs - Interactive visualization
- oscillation_detector.rs - Pattern detection
- container_summary.rs - Container statistics
- inspect_metrics.rs - Parquet inspection
- merge_parquet.py - File merging (still Python)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Rust cargo-script implementation of parquet file merging:
- Recursive directory scanning with walkdir
- Configurable compression (zstd, snappy, gzip, none)
- Skips empty files gracefully
- Progress output during read/write phases

Deletes Python version.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- dev.py: Single-file PEP 723 script for server lifecycle management
  - Commands: status, start, stop, restart
  - Always rebuilds binary before start/restart
  - Health check via /api/health endpoint
  - Port calculated from checkout path for worktree isolation
  - Stale PID detection and cleanup
  - State stored in .dev/ directory

- server.rs: Add GET /api/health returning {"status": "ok"}

- index.html: Minor styling fixes for chart layout

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Increase Y-axis size from 60 to 80px for 7-digit label visibility
- Adjust main chart height to ensure range chart visible without scroll
- Disable uPlot's built-in legend (we use custom legend at top)
- Default to total_cpu_usage_millicores metric instead of first in list

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add left sidebar (300px) with metrics list, filters, containers
- Implement Fuse.js fuzzy search for metric names
- Replace multi-select dropdown with checkbox list for containers
- Add color-coded QoS badges (Guaranteed/Burstable/BestEffort)
- Move Rescale Y and Reset buttons to main header
- Improve container selection with visible checkboxes and avg values
- Reorganize layout using CSS Grid for sidebar + main content

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
scottopell and others added 26 commits January 12, 2026 12:43
Fixes confusion where LLM agents couldn't distinguish between a local
viewer and a cluster port-forward when both used the same port range.

Port allocation (per-worktree):
- Local viewer:   8050-8549 (./dev.py local viewer)
- Cluster viewer: 8550-9049 (./dev.py cluster viewer)
- MCP server:     9050-9549 (./dev.py cluster mcp)

Also adds `./dev.py status` top-level command that shows all viewers
at once - this is now the recommended way to check what's running.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The astronomy-shop AIOpsLab scenario deploys 30+ containers which
caused OOM with the previous 256Mi/512Mi limits. Increase to 1GB
to support monitoring complex microservice deployments.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add validation that crashes with descriptive error message if interval_ms
is set to anything other than 1000ms. This prevents misconfiguration since
lading-capture has a hardcoded 1-second tick duration (TICK_DURATION_MS).

Sub-second sampling would collect data but timestamps would be bucketed to
1-second resolution, losing the intended granularity. For gauges (most
cgroup metrics), only the last sample per second would be preserved.

The error message explains:
- The exact source locations in lading-capture
- Technical details of the bucketing behavior
- That this is NOT insurmountable - just needs implementation

Also updates the --interval-ms help text to document this limitation.

See: DataDog/lading#1662 (comment)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add three test scenarios for fine-grained-monitor validation:
- crash-loop: Container that exits and restarts repeatedly
- memory-leak: Container with gradual memory growth
- oom-kill: Container that gets OOM-killed quickly

Also fix scenario.py cleanup to delete configmaps (kubectl 'all' doesn't
include them, causing orphaned configmaps on stop).

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Update service name from 'mcp-metrics-viewer' to 'fgm-mcp-server' to match
the actual deployed service name.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add range=all parameter to /api/containers query. The viewer defaults to
filtering containers by age (1h), which caused list_containers to return
0 results when querying for containers in new scenario runs.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace index.json persistence with lightweight sidecar files for ~100x
faster viewer startup (~0.2s vs ~13s for large datasets).

Collector changes:
- Write .containers sidecar file at each parquet rotation
- Use bincode serialization for fast reads (~10-100x faster than JSON)
- Remove index.json persistence - container metadata is now in-memory only

Viewer changes:
- New from_directory() loads containers from sidecars or falls back to
  parquet scan for older data without sidecars
- Compute first_seen_ms/last_seen_ms from parquet file timestamps
- Remove wait_for_index(), wait for parquet files directly instead

The sidecar approach is more robust than index.json:
- Each parquet file has its own sidecar (no single point of failure)
- Viewer can start with partial data while collector is still running
- No need to coordinate index updates between collector and viewer

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Remove dead code that was made obsolete by the sidecar-based startup:

- Remove ContainerStats struct (unused after viewer simplification)
- Remove get_container_stats benchmarks (cold/warm/load_all_metrics)
- Update benchmark docs in README.md, CLAUDE.md, and dev.py help

Remaining benchmarks:
- scan_metadata: Startup path performance
- get_timeseries_single_container: Single container query
- get_timeseries_all_containers: All containers query

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add comprehensive unit tests for sidecar module:
- test_roundtrip: Basic serialization/deserialization
- test_roundtrip_empty: Empty container list
- test_unsupported_version: Future version rejection
- test_corrupt_file: Garbage data handling
- test_missing_file: Non-existent file handling
- test_atomic_write: Verify temp file cleanup

Remove old_file_bytes from rotation log message. This field showed the
file size at rotation trigger time, before the write was complete,
consistently displaying 0 and misleading investigators.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add background task that refreshes container metadata from sidecar files
every 30 seconds. This fixes the issue where the viewer would cache an
empty container list if started before data was available.

Changes:
- Wrap MetadataIndex in RwLock for safe concurrent updates
- Add refresh_containers_from_sidecars() method to LazyDataStore
- Add accessor methods (get_metrics, get_qos_classes, get_namespaces)
- Spawn background refresh task in server with configurable interval
- Default refresh interval: 30 seconds

The refresh is cheap (~10ms) since it only reads small sidecar files,
not the full parquet data. Logs when container count changes.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sidecar files are immutable once written, so there's no need to re-read
them on each refresh cycle. Now tracks which parquet files have been
processed via file_containers keys and only reads sidecars for new files.

Before: ~49ms to re-read all sidecars every 30s
After: ~2ms to check and skip, only reads new files on rotation

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Enables label-based filtering in the viewer without requiring full
parquet scans. Labels are now:
- Stored in ContainerEntry in the index
- Written to sidecar files (v2 format)
- Read from sidecars and returned via the containers API

Backwards compatible: v1 sidecars without labels are still readable
via #[serde(default)] on the new labels field.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add namespace-per-run isolation for gensim-generated scenarios
- Each scenario run gets its own namespace (fgm-run-{run_id})
- Detect gensim scenarios by "Generated by k8s-adapter" comment
- Create namespace on run, delete entire namespace on stop
- Store namespace and is_gensim flags in run metadata
- Add todo-app scenario generated from gensim vibecoder
- Add todo-app dashboard for the multi-tier web application

The namespace-per-run approach provides complete isolation between
concurrent scenario runs and simplifies cleanup (deleting the namespace
cascades to all resources).

Note: The todo-app scenario contains generated code from gensim/vibecoder
which does not follow agent repository conventions (copyright headers,
go module registration, etc.) as it's intended to be standalone test
workloads.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
API refactor makes the fast sidecar path the default:
- LazyDataStore::new(dir) is now the primary constructor
- LazyDataStore::from_files(paths) also tries sidecars first
- Both methods fall back to parquet scanning if sidecars missing

Benchmark generator rewrite:
- Two scenarios: realistic (~20 containers) and stress (~50 containers)
- Single --duration argument (parses 1h, 6h, 24h, 7d, etc.)
- Generates sidecar files alongside parquet files
- Removed scenarios: multipod, container-churn (simplified)

Benchmark changes:
- Renamed scan_metadata_parquet -> scan_metadata_from_files
- Renamed scan_metadata_sidecar -> scan_metadata_directory
- Updated all benchmarks to use new API

Clippy fixes:
- &PathBuf -> &Path in function signatures
- (400..500).contains(&status) instead of manual range check
- #[allow(dead_code)] for viewer-only sidecar functions

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add metric name as title above each chart panel for clarity
- Add SI unit formatting for Y-axis values:
  - Bytes metrics (memory, io): K, M, G suffixes
  - Microsecond metrics: us, ms, s suffixes
  - Numbers: K, M suffixes for large values
- Add CSS styling for panel titles

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Introduces a Data Provider Interface pattern to cleanly abstract data
loading from the source (API vs parquet-wasm). This enables future
self-contained HTML snapshots without divergent code paths.

New files:
- data-provider.js: Interface definition and provider management
- api-provider.js: HTTP fetch implementation (current behavior)
- parquet-provider.js: Skeleton for parquet-wasm (to be implemented)

The effects.js module now imports Api from data-provider.js and
initializes the default ApiProvider. Effect handlers are unchanged.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Use hash-based ID generation to ensure each container has a unique
12-character prefix (short_id). The previous approach produced IDs
with many leading zeros, causing all containers to share the same
short_id and breaking multi-container selection in the viewer.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…CON-007)

Problem:
Consolidated parquet files caused errors in DuckDB and PyArrow:
- DuckDB: "don't know what type:"
- PyArrow: "Column cannot have more than one dictionary"

Root Cause:
Arrow's Parquet writer creates a new dictionary for each row group. When
consolidating multiple input files:
1. Each input file has its own row group(s)
2. Each row group gets its own dictionary for dictionary-encoded columns
3. The resulting file has multiple dictionaries per column
4. DuckDB/PyArrow expect at most ONE dictionary per column

Solution:
Set max_row_group_size to usize::MAX in WriterProperties. This forces all
batches from all input files into a SINGLE row group, resulting in ONE
dictionary per column.

Memory remains bounded because ArrowWriter encodes data incrementally -
it doesn't buffer raw Arrow arrays, just the encoded Parquet data.

Trade-off:
We lose row-group-level predicate pushdown (skipping entire row groups
based on min/max statistics). For time-series analysis of specific
containers, this is acceptable since queries typically target narrow
time ranges within files anyway.

Tested:
- Consolidated 10 files (~5M rows) successfully
- DuckDB queries work without errors
- Verified single row group in parquet metadata

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…lity

- Reduce leak rate from 1MB to 512KB per 2 seconds
- Increase memory limit from 64Mi to 256Mi
- OOM now occurs after ~8-10 minutes instead of ~2 minutes
- Fix MB calculation in log output
- Makes changepoint detection easier to observe

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Checkpoint commit - work in progress.

Adds benchmarks for:
- Index queries (list_metrics, list_containers)
- Study algorithms (periodicity, changepoint)
- MCP patterns (analyze_container, summarize_container)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Implement PELT (Pruned Exact Linear Time) algorithm for changepoint
detection, replacing the augurs/BOCPD dependency which had O(n²)
complexity and would timeout on large datasets.

PELT provides:
- O(n) expected time complexity via dynamic programming with pruning
- O(1) segment cost calculation using cumulative sums
- BIC-like penalty for automatic changepoint count selection
- No external dependencies (pure Rust implementation)

Performance improvement on 28,800 point dataset:
- BOCPD: >180s timeout (never completed)
- PELT: 72-127ms

Based on: Killick, R., Fearnhead, P., & Eckley, I. A. (2012).
"Optimal detection of changepoints with a linear computational cost."

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Three key optimizations to reduce memory allocation churn:

1. HashMap/HashSet pre-sizing: Add with_capacity() calls throughout
   - load_from_sidecars, scan_metadata, refresh_containers_from_sidecars
   - load_metric_data, load_row_groups
   - Eliminates ~111 rehash operations per query

2. Vector pre-sizing: Add Vec::with_capacity() for:
   - dates_to_scan in discover_files_by_time_range
   - result/missing vectors in get_timeseries
   - containers vector in get_containers_by_recency
   - Eliminates raw_vec::finish_grow reallocs

3. Arc<str> string interning for container IDs:
   - Create interner at query start with all requested container IDs
   - Pass through load_metric_from_file -> load_row_groups
   - Use Arc::clone() (pointer increment) instead of String::clone()
   - Eliminates ~3,447 String allocations per query
   - Convert back to String only at API boundary

Memory profiling showed:
- String::clone: 102KB across 3,447 calls (FIXED)
- raw_vec::finish_grow: 1.4MB reallocs (FIXED)
- RawTable::reserve_rehash: 90KB across 111 calls (FIXED)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add alloc-profile binary that uses a counting allocator to measure
allocation behavior during get_timeseries hot path. This tool helps
quantify the impact of memory optimizations.

Usage:
  cargo run --release --bin alloc-profile

The tool:
- Exercises the get_timeseries hot path with 10 containers
- Measures allocation count and bytes per iteration
- Outputs machine-readable summary (ALLOC_COUNT, ALLOC_BYTES, etc.)

Results show memory optimizations reduced allocation count by 72%:
- Before: 400,597 allocations per iteration
- After:  112,482 allocations per iteration

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add deterministic selection for metrics and containers in benchmarks:
- Sort parquet files by path
- Sort metrics alphabetically before selecting
- Sort containers by ID instead of recency

This eliminates variability from HashMap iteration order and
recency-based sorting that caused wild benchmark swings
(e.g., study_changepoint ranged from 14ms to 2.3s between runs).

Results are now reproducible within ~1% variance.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@github-actions github-actions bot added long review PR is complex, plan time to review it team/agent-devx labels Jan 12, 2026
@agent-platform-auto-pr
Copy link
Contributor

Static quality checks

✅ Please find below the results from static quality gates
Comparison made with ancestor 59e667e
📊 Static Quality Gates Dashboard

Successful checks

Info

Quality gate Change Size (prev → curr → max)
agent_deb_amd64 +71.02 KiB (0.01% increase) 705.250 → 705.319 → 708.410
agent_deb_amd64_fips +63.02 KiB (0.01% increase) 700.539 → 700.601 → 704.000
agent_heroku_amd64 +15.02 KiB (0.00% increase) 326.904 → 326.919 → 329.530
agent_msi +23.8 KiB (0.00% increase) 571.431 → 571.454 → 982.080
agent_rpm_amd64 +71.02 KiB (0.01% increase) 705.236 → 705.306 → 708.380
agent_rpm_amd64_fips +63.02 KiB (0.01% increase) 700.525 → 700.587 → 703.990
agent_rpm_arm64 +115.02 KiB (0.02% increase) 686.777 → 686.890 → 693.520
agent_rpm_arm64_fips +55.02 KiB (0.01% increase) 682.918 → 682.972 → 688.480
agent_suse_amd64 +71.02 KiB (0.01% increase) 705.236 → 705.306 → 708.380
agent_suse_amd64_fips +63.02 KiB (0.01% increase) 700.525 → 700.587 → 703.990
agent_suse_arm64 +115.02 KiB (0.02% increase) 686.777 → 686.890 → 693.520
agent_suse_arm64_fips +55.02 KiB (0.01% increase) 682.918 → 682.972 → 688.480
docker_agent_amd64 +59.02 KiB (0.01% increase) 767.475 → 767.532 → 770.720
docker_agent_arm64 +51.02 KiB (0.01% increase) 773.626 → 773.675 → 780.200
docker_agent_jmx_amd64 +59.02 KiB (0.01% increase) 958.353 → 958.411 → 961.600
docker_agent_jmx_arm64 +51.01 KiB (0.01% increase) 953.223 → 953.273 → 959.800
docker_cluster_agent_amd64 +15.97 KiB (0.01% increase) 180.773 → 180.788 → 181.080
docker_dogstatsd_amd64 +8.0 KiB (0.02% increase) 38.800 → 38.808 → 39.380
docker_dogstatsd_arm64 +64.0 KiB (0.17% increase) 37.065 → 37.128 → 37.940
dogstatsd_deb_amd64 +12.0 KiB (0.04% increase) 30.019 → 30.031 → 30.610
dogstatsd_deb_arm64 +8.0 KiB (0.03% increase) 28.168 → 28.176 → 29.110
dogstatsd_rpm_amd64 +12.0 KiB (0.04% increase) 30.019 → 30.031 → 30.610
dogstatsd_suse_amd64 +12.0 KiB (0.04% increase) 30.019 → 30.031 → 30.610
iot_agent_deb_amd64 +24.0 KiB (0.05% increase) 43.014 → 43.037 → 43.290
iot_agent_deb_arm64 +20.0 KiB (0.05% increase) 40.135 → 40.155 → 40.920
iot_agent_deb_armhf +20.0 KiB (0.05% increase) 40.716 → 40.736 → 41.030
iot_agent_rpm_amd64 +24.0 KiB (0.05% increase) 43.015 → 43.038 → 43.290
iot_agent_suse_amd64 +24.0 KiB (0.05% increase) 43.015 → 43.038 → 43.290
3 successful checks with minimal change (< 2 KiB)
Quality gate Current Size
docker_cluster_agent_arm64 196.618 MiB
docker_cws_instrumentation_amd64 7.135 MiB
docker_cws_instrumentation_arm64 6.689 MiB
On-wire sizes (compressed)
Quality gate Change Size (prev → curr → max)
agent_deb_amd64 -8.31 KiB (0.00% reduction) 173.361 → 173.353 → 174.490
agent_deb_amd64_fips +53.66 KiB (0.03% increase) 172.264 → 172.316 → 173.750
agent_heroku_amd64 +4.25 KiB (0.00% increase) 87.112 → 87.116 → 88.450
agent_msi -4.0 KiB (0.00% reduction) 142.902 → 142.898 → 143.020
agent_rpm_amd64 -2.32 KiB (0.00% reduction) 176.141 → 176.139 → 177.660
agent_rpm_amd64_fips +51.6 KiB (0.03% increase) 174.975 → 175.026 → 176.600
agent_rpm_arm64 +47.73 KiB (0.03% increase) 159.382 → 159.429 → 161.260
agent_rpm_arm64_fips -43.26 KiB (0.03% reduction) 158.768 → 158.725 → 160.550
agent_suse_amd64 -2.32 KiB (0.00% reduction) 176.141 → 176.139 → 177.660
agent_suse_amd64_fips +51.6 KiB (0.03% increase) 174.975 → 175.026 → 176.600
agent_suse_arm64 +47.73 KiB (0.03% increase) 159.382 → 159.429 → 161.260
agent_suse_arm64_fips -43.26 KiB (0.03% reduction) 158.768 → 158.725 → 160.550
docker_agent_amd64 +10.95 KiB (0.00% increase) 261.065 → 261.076 → 262.450
docker_agent_arm64 +34.36 KiB (0.01% increase) 250.072 → 250.106 → 252.630
docker_agent_jmx_amd64 +14.56 KiB (0.00% increase) 329.701 → 329.715 → 331.080
docker_agent_jmx_arm64 +42.08 KiB (0.01% increase) 314.682 → 314.723 → 317.270
docker_cluster_agent_amd64 +16.13 KiB (0.02% increase) 63.861 → 63.877 → 64.490
docker_cluster_agent_arm64 +12.9 KiB (0.02% increase) 60.148 → 60.160 → 61.170
docker_cws_instrumentation_amd64 neutral 2.994 MiB
docker_cws_instrumentation_arm64 neutral 2.726 MiB
docker_dogstatsd_amd64 +2.81 KiB (0.02% increase) 15.023 → 15.026 → 15.820
docker_dogstatsd_arm64 +8.27 KiB (0.06% increase) 14.343 → 14.351 → 14.830
dogstatsd_deb_amd64 +4.55 KiB (0.06% increase) 7.941 → 7.945 → 8.790
dogstatsd_deb_arm64 +3.62 KiB (0.05% increase) 6.820 → 6.824 → 7.710
dogstatsd_rpm_amd64 +2.82 KiB (0.03% increase) 7.952 → 7.955 → 8.800
dogstatsd_suse_amd64 +2.82 KiB (0.03% increase) 7.952 → 7.955 → 8.800
iot_agent_deb_amd64 +11.82 KiB (0.10% increase) 11.267 → 11.278 → 12.040
iot_agent_deb_arm64 +3.8 KiB (0.04% increase) 9.635 → 9.639 → 10.450
iot_agent_deb_armhf +7.64 KiB (0.08% increase) 9.828 → 9.836 → 10.620
iot_agent_rpm_amd64 +7.47 KiB (0.06% increase) 11.285 → 11.292 → 12.060
iot_agent_suse_amd64 +7.47 KiB (0.06% increase) 11.285 → 11.292 → 12.060

@cit-pr-commenter
Copy link

Regression Detector

Regression Detector Results

Metrics dashboard
Target profiles
Run ID: 9d4f6480-eee8-4682-b508-484caea0695b

Baseline: 49aeaf1
Comparison: 6992d55
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf experiment goal Δ mean % Δ mean % CI trials links
docker_containers_cpu % cpu utilization -0.37 [-3.34, +2.61] 1 Logs

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI trials links
quality_gate_metrics_logs memory utilization +0.54 [+0.33, +0.74] 1 Logs bounds checks dashboard
quality_gate_idle_all_features memory utilization +0.32 [+0.29, +0.36] 1 Logs bounds checks dashboard
quality_gate_idle memory utilization +0.21 [+0.16, +0.25] 1 Logs bounds checks dashboard
ddot_metrics_sum_cumulative memory utilization +0.15 [-0.01, +0.32] 1 Logs
file_to_blackhole_500ms_latency egress throughput +0.10 [-0.28, +0.48] 1 Logs
uds_dogstatsd_20mb_12k_contexts_20_senders memory utilization +0.08 [+0.02, +0.14] 1 Logs
file_to_blackhole_100ms_latency egress throughput +0.04 [-0.00, +0.09] 1 Logs
file_to_blackhole_1000ms_latency egress throughput +0.03 [-0.39, +0.45] 1 Logs
file_tree memory utilization +0.02 [-0.04, +0.07] 1 Logs
tcp_dd_logs_filter_exclude ingress throughput +0.01 [-0.07, +0.09] 1 Logs
uds_dogstatsd_to_api ingress throughput -0.00 [-0.14, +0.14] 1 Logs
uds_dogstatsd_to_api_v3 ingress throughput -0.00 [-0.12, +0.12] 1 Logs
file_to_blackhole_0ms_latency egress throughput -0.02 [-0.43, +0.39] 1 Logs
otlp_ingest_metrics memory utilization -0.08 [-0.23, +0.08] 1 Logs
docker_containers_memory memory utilization -0.11 [-0.18, -0.04] 1 Logs
ddot_metrics_sum_cumulativetodelta_exporter memory utilization -0.21 [-0.44, +0.02] 1 Logs
quality_gate_logs % cpu utilization -0.32 [-1.80, +1.16] 1 Logs bounds checks dashboard
otlp_ingest_logs memory utilization -0.33 [-0.43, -0.24] 1 Logs
docker_containers_cpu % cpu utilization -0.37 [-3.34, +2.61] 1 Logs
ddot_metrics_sum_delta memory utilization -0.62 [-0.82, -0.43] 1 Logs
ddot_metrics memory utilization -0.65 [-0.85, -0.45] 1 Logs
ddot_logs memory utilization -1.25 [-1.32, -1.18] 1 Logs
tcp_syslog_to_blackhole ingress throughput -3.39 [-3.47, -3.31] 1 Logs

Bounds Checks: ✅ Passed

perf experiment bounds_check_name replicates_passed links
docker_containers_cpu simple_check_run 10/10
docker_containers_memory memory_usage 10/10
docker_containers_memory simple_check_run 10/10
file_to_blackhole_0ms_latency lost_bytes 10/10
file_to_blackhole_0ms_latency memory_usage 10/10
file_to_blackhole_1000ms_latency lost_bytes 10/10
file_to_blackhole_1000ms_latency memory_usage 10/10
file_to_blackhole_100ms_latency lost_bytes 10/10
file_to_blackhole_100ms_latency memory_usage 10/10
file_to_blackhole_500ms_latency lost_bytes 10/10
file_to_blackhole_500ms_latency memory_usage 10/10
quality_gate_idle intake_connections 10/10 bounds checks dashboard
quality_gate_idle memory_usage 10/10 bounds checks dashboard
quality_gate_idle_all_features intake_connections 10/10 bounds checks dashboard
quality_gate_idle_all_features memory_usage 10/10 bounds checks dashboard
quality_gate_logs intake_connections 10/10 bounds checks dashboard
quality_gate_logs lost_bytes 10/10 bounds checks dashboard
quality_gate_logs memory_usage 10/10 bounds checks dashboard
quality_gate_metrics_logs cpu_usage 10/10 bounds checks dashboard
quality_gate_metrics_logs intake_connections 10/10 bounds checks dashboard
quality_gate_metrics_logs lost_bytes 10/10 bounds checks dashboard
quality_gate_metrics_logs memory_usage 10/10 bounds checks dashboard

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

CI Pass/Fail Decision

Passed. All Quality Gates passed.

  • quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
  • quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
  • quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

long review PR is complex, plan time to review it team/agent-devx

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants