-
Notifications
You must be signed in to change notification settings - Fork 1.4k
mcp experiments #44989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
mcp experiments #44989
Conversation
Document how to configure kubernetes-mcp-server for Claude Code with a dedicated kubeconfig isolated to the gadget-dev cluster. Includes setup, token renewal, and cleanup instructions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Clear guidance for AI agents on: - Preferring kubernetes-mcp-server tools over kubectl - When to use kubectl (apply -f, complex selectors) - VM operations via limactl shell - Common workflow examples 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
… collection Complete implementation of core monitoring functionality: - REQ-FM-001: Cgroup v2 filesystem scanning with multi-CRI pattern matching (containerd, CRI-O, Docker), pod UID extraction, QoS class detection - REQ-FM-002: Memory metrics via lading's smaps_rollup (PSS) and cgroup v2 polling - REQ-FM-003: CPU metrics via lading's cgroup_v2::cpu::Sampler with delta tracking Also adds deployment infrastructure: - Multi-stage Dockerfile with cargo-chef for fast rebuilds - Kubernetes DaemonSet manifest - inspect_metrics.py utility script Progress: 4/5 requirements complete (REQ-FM-005 planned for later phase) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Local development runs on Apple Silicon - don't specify linux/amd64 in docker builds during testing loops. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
…re API Implement time-based file rotation for Parquet output files to ensure each file has a valid footer and is immediately readable. Uses the new rotation API from lading_capture. Changes: - Use channel-based rotation via RotationSender from CaptureManager - Rotate files every 90 seconds (exceeds 60s accumulator window) - Hive-style partitioning: dt=YYYY-MM-DD/identifier=<pod>/metrics-*.parquet - Add global labels (node_name, cluster_name) to all metrics - Write session manifest (session.json) on startup - Track total bytes and enforce 1GiB limit with graceful shutdown - Update specs (requirements, design, executive) for rotation feature - Update daemonset.yaml to use :rotation image tag The rotation ensures files are always readable without requiring graceful shutdown, solving the Parquet footer problem for long-running captures. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Update specs to reflect the completed REQ-FM-004 implementation: - Mark REQ-FM-004 as complete (was "In Progress") - Update progress to 4 of 5 complete - Document channel-based rotation API (start_with_rotation, RotationSender) - Add verified metrics: 38,496 rows, 204 unique metrics, valid footers - Update technical summary with new rotation flow - Replace old CaptureManager recreation approach with channel-based design 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ation Add utility scripts for ad-hoc analysis of collected Parquet metrics: - merge_parquet.py: Combine multiple parquet files into one - memory_timeline.py: Plot memory usage over time with Plotly - cpu_analysis.py: Analyze CPU usage and throttling with rate deltas - container_summary.py: Generate per-container statistics table - pressure_analysis.py: Visualize PSI (Pressure Stall Information) metrics - inspect_metrics.py: Fix crash on labels column with unhashable types All scripts use uv inline dependencies for easy standalone execution. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
REQ-FM-005 was too future-facing to be useful in current planning. The feature depended on Agent output interception which is not in scope. Removed references from: - requirements.md: Deleted requirement definition - design.md: Removed implementation section and dependency reference - executive.md: Updated status table (4/4 complete) - CONTINUATION.md: Updated context 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Port the autocorrelation-based CPU oscillation detection algorithm from
the Go implementation (sopell/q-branch-cpu-oscillation-per-container)
to a standalone Python script that works on parquet data files.
Key features:
- Autocorrelation analysis to detect periodic CPU patterns
- Configurable thresholds (periodicity score, amplitude, period range)
- Per-container analysis with QoS class display
- Optional Plotly visualization with autocorrelation plots
- CLI interface matching existing script patterns
Algorithm matches detector.go:
- 60-sample analysis window (1Hz sampling = 60 seconds)
- Normalized autocorrelation for lags in [min_period, max_period]
- Detection when periodicity_score >= threshold AND amplitude >= min
Usage:
uv run scripts/oscillation_detector.py metrics.parquet
uv run scripts/oscillation_detector.py metrics.parquet --threshold 0.4 --plot
Reference: pkg/collector/corechecks/containers/cpu_oscillation/detector.go
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add column projection to only load needed columns (5 of ~15) - Enable opt-level=3 in dev profile for full optimizations - Fix borrow checker issue with projection mask construction Processing 35M rows now takes ~11s vs ~100s unoptimized. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
…Q-FM-005) Add REQ-FM-005 to specs and implement metrics_viewer.py: - Web-based interactive timeseries viewer using Dash/Plotly - Container filtering via multi-select dropdown with quick-select buttons - WebGL rendering (scattergl) for smooth performance with large datasets - Pan/zoom interactions and range slider for time navigation - Pre-computed CPU deltas at startup for responsive filtering - Automatic browser launch when server starts Implements: REQ-FM-005 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Pan/zoom into a region, then click "Rescale Y-Axis" to fit the y-axis to the visible data range. Useful when initial data spikes throw off the scale. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Rust version of metrics viewer with axum backend: - Fast parquet loading (~10s for 35M rows vs Python's slower startup) - axum HTTP server serving embedded Plotly.js frontend - REST API: /api/containers and /api/timeseries endpoints - Same UI features: container dropdown, rescale Y, reset zoom - Single cargo-script file, opens browser automatically Usage: ./scripts/metrics_viewer.rs metrics.parquet 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Remove buggy range slider (conflicts with scattergl) - Use Plotly.react instead of newPlot for updates - Use raw timestamps instead of Date objects - Use Plotly.relayout for rescale/reset (faster) - Read x-range from _fullLayout instead of manual tracking - Add console.log for debugging 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Changed xaxis type from 'linear' to 'date' for human-readable timestamps - Restored range slider with 5% thickness for time navigation - Updated rescaleY to convert date strings back to milliseconds for comparison - Increased chart height to 70vh to accommodate range slider 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Consolidated title, container selector, and buttons into single header row - Moved legend to top-right (above chart) to avoid range slider overlap - Increased range slider thickness slightly for better usability - Changed hover mode to 'x unified' for easier comparison - Reduced overall padding for more chart real estate - Chart now uses calc(100vh - 90px) for maximum vertical space 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Added package.edition = "2024" to fix cargo warning - Added timing for each loading step: - Read time (parsing parquet rows) - Process time (building container data structures) - Total load time 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Now shows three distinct phases: 1. Open file + read parquet metadata (disk I/O for footer, schema parsing) 2. ZSTD decompress + Arrow decode + CPU deltas (bulk data processing) 3. Build serving structs + sort by avg CPU (in-memory transformations) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Deleted scripts/metrics_viewer.py (replaced by Rust version) - Updated executive.md with Rust/axum implementation details - Updated design.md with new architecture: - Rust backend with axum HTTP server - Embedded Plotly.js frontend - REST API documentation - Performance characteristics (~10s for 35M rows) - Updated file structure to reflect current scripts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Deleted: - cpu_analysis.py (superseded by metrics_viewer.rs) - pressure_analysis.py (unused) - oscillation_detector.py (Rust version is equivalent) - __pycache__/ Remaining scripts: - metrics_viewer.rs - Interactive visualization - oscillation_detector.rs - Pattern detection - container_summary.py - Container overview - inspect_metrics.py - Quick parquet inspection - memory_timeline.py - Memory analysis - merge_parquet.py - File merging utility 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
New Rust scripts: - inspect_metrics.rs: Shows schema, row counts, unique metrics, time range, sample data - container_summary.rs: Per-container stats (memory, CPU, throttling, PSI) - Supports --format table/csv/json - Supports --sort-by and --top N Deleted Python versions: - inspect_metrics.py - container_summary.py Remaining scripts: - metrics_viewer.rs - Interactive visualization - oscillation_detector.rs - Pattern detection - container_summary.rs - Container statistics - inspect_metrics.rs - Parquet inspection - merge_parquet.py - File merging (still Python) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Rust cargo-script implementation of parquet file merging: - Recursive directory scanning with walkdir - Configurable compression (zstd, snappy, gzip, none) - Skips empty files gracefully - Progress output during read/write phases Deletes Python version. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- dev.py: Single-file PEP 723 script for server lifecycle management
- Commands: status, start, stop, restart
- Always rebuilds binary before start/restart
- Health check via /api/health endpoint
- Port calculated from checkout path for worktree isolation
- Stale PID detection and cleanup
- State stored in .dev/ directory
- server.rs: Add GET /api/health returning {"status": "ok"}
- index.html: Minor styling fixes for chart layout
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Increase Y-axis size from 60 to 80px for 7-digit label visibility - Adjust main chart height to ensure range chart visible without scroll - Disable uPlot's built-in legend (we use custom legend at top) - Default to total_cpu_usage_millicores metric instead of first in list 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add left sidebar (300px) with metrics list, filters, containers - Implement Fuse.js fuzzy search for metric names - Replace multi-select dropdown with checkbox list for containers - Add color-coded QoS badges (Guaranteed/Burstable/BestEffort) - Move Rescale Y and Reset buttons to main header - Improve container selection with visible checkboxes and avg values - Reorganize layout using CSS Grid for sidebar + main content 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Fixes confusion where LLM agents couldn't distinguish between a local viewer and a cluster port-forward when both used the same port range. Port allocation (per-worktree): - Local viewer: 8050-8549 (./dev.py local viewer) - Cluster viewer: 8550-9049 (./dev.py cluster viewer) - MCP server: 9050-9549 (./dev.py cluster mcp) Also adds `./dev.py status` top-level command that shows all viewers at once - this is now the recommended way to check what's running. Co-Authored-By: Claude Opus 4.5 <[email protected]>
The astronomy-shop AIOpsLab scenario deploys 30+ containers which caused OOM with the previous 256Mi/512Mi limits. Increase to 1GB to support monitoring complex microservice deployments. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add validation that crashes with descriptive error message if interval_ms is set to anything other than 1000ms. This prevents misconfiguration since lading-capture has a hardcoded 1-second tick duration (TICK_DURATION_MS). Sub-second sampling would collect data but timestamps would be bucketed to 1-second resolution, losing the intended granularity. For gauges (most cgroup metrics), only the last sample per second would be preserved. The error message explains: - The exact source locations in lading-capture - Technical details of the bucketing behavior - That this is NOT insurmountable - just needs implementation Also updates the --interval-ms help text to document this limitation. See: DataDog/lading#1662 (comment) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add three test scenarios for fine-grained-monitor validation: - crash-loop: Container that exits and restarts repeatedly - memory-leak: Container with gradual memory growth - oom-kill: Container that gets OOM-killed quickly Also fix scenario.py cleanup to delete configmaps (kubectl 'all' doesn't include them, causing orphaned configmaps on stop). Co-Authored-By: Claude Opus 4.5 <[email protected]>
Update service name from 'mcp-metrics-viewer' to 'fgm-mcp-server' to match the actual deployed service name. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add range=all parameter to /api/containers query. The viewer defaults to filtering containers by age (1h), which caused list_containers to return 0 results when querying for containers in new scenario runs. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace index.json persistence with lightweight sidecar files for ~100x faster viewer startup (~0.2s vs ~13s for large datasets). Collector changes: - Write .containers sidecar file at each parquet rotation - Use bincode serialization for fast reads (~10-100x faster than JSON) - Remove index.json persistence - container metadata is now in-memory only Viewer changes: - New from_directory() loads containers from sidecars or falls back to parquet scan for older data without sidecars - Compute first_seen_ms/last_seen_ms from parquet file timestamps - Remove wait_for_index(), wait for parquet files directly instead The sidecar approach is more robust than index.json: - Each parquet file has its own sidecar (no single point of failure) - Viewer can start with partial data while collector is still running - No need to coordinate index updates between collector and viewer Co-Authored-By: Claude Opus 4.5 <[email protected]>
Remove dead code that was made obsolete by the sidecar-based startup: - Remove ContainerStats struct (unused after viewer simplification) - Remove get_container_stats benchmarks (cold/warm/load_all_metrics) - Update benchmark docs in README.md, CLAUDE.md, and dev.py help Remaining benchmarks: - scan_metadata: Startup path performance - get_timeseries_single_container: Single container query - get_timeseries_all_containers: All containers query Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add comprehensive unit tests for sidecar module: - test_roundtrip: Basic serialization/deserialization - test_roundtrip_empty: Empty container list - test_unsupported_version: Future version rejection - test_corrupt_file: Garbage data handling - test_missing_file: Non-existent file handling - test_atomic_write: Verify temp file cleanup Remove old_file_bytes from rotation log message. This field showed the file size at rotation trigger time, before the write was complete, consistently displaying 0 and misleading investigators. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add background task that refreshes container metadata from sidecar files every 30 seconds. This fixes the issue where the viewer would cache an empty container list if started before data was available. Changes: - Wrap MetadataIndex in RwLock for safe concurrent updates - Add refresh_containers_from_sidecars() method to LazyDataStore - Add accessor methods (get_metrics, get_qos_classes, get_namespaces) - Spawn background refresh task in server with configurable interval - Default refresh interval: 30 seconds The refresh is cheap (~10ms) since it only reads small sidecar files, not the full parquet data. Logs when container count changes. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sidecar files are immutable once written, so there's no need to re-read them on each refresh cycle. Now tracks which parquet files have been processed via file_containers keys and only reads sidecars for new files. Before: ~49ms to re-read all sidecars every 30s After: ~2ms to check and skip, only reads new files on rotation Co-Authored-By: Claude Opus 4.5 <[email protected]>
Enables label-based filtering in the viewer without requiring full parquet scans. Labels are now: - Stored in ContainerEntry in the index - Written to sidecar files (v2 format) - Read from sidecars and returned via the containers API Backwards compatible: v1 sidecars without labels are still readable via #[serde(default)] on the new labels field. Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add namespace-per-run isolation for gensim-generated scenarios
- Each scenario run gets its own namespace (fgm-run-{run_id})
- Detect gensim scenarios by "Generated by k8s-adapter" comment
- Create namespace on run, delete entire namespace on stop
- Store namespace and is_gensim flags in run metadata
- Add todo-app scenario generated from gensim vibecoder
- Add todo-app dashboard for the multi-tier web application
The namespace-per-run approach provides complete isolation between
concurrent scenario runs and simplifies cleanup (deleting the namespace
cascades to all resources).
Note: The todo-app scenario contains generated code from gensim/vibecoder
which does not follow agent repository conventions (copyright headers,
go module registration, etc.) as it's intended to be standalone test
workloads.
Co-Authored-By: Claude Opus 4.5 <[email protected]>
API refactor makes the fast sidecar path the default: - LazyDataStore::new(dir) is now the primary constructor - LazyDataStore::from_files(paths) also tries sidecars first - Both methods fall back to parquet scanning if sidecars missing Benchmark generator rewrite: - Two scenarios: realistic (~20 containers) and stress (~50 containers) - Single --duration argument (parses 1h, 6h, 24h, 7d, etc.) - Generates sidecar files alongside parquet files - Removed scenarios: multipod, container-churn (simplified) Benchmark changes: - Renamed scan_metadata_parquet -> scan_metadata_from_files - Renamed scan_metadata_sidecar -> scan_metadata_directory - Updated all benchmarks to use new API Clippy fixes: - &PathBuf -> &Path in function signatures - (400..500).contains(&status) instead of manual range check - #[allow(dead_code)] for viewer-only sidecar functions Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add metric name as title above each chart panel for clarity - Add SI unit formatting for Y-axis values: - Bytes metrics (memory, io): K, M, G suffixes - Microsecond metrics: us, ms, s suffixes - Numbers: K, M suffixes for large values - Add CSS styling for panel titles Co-Authored-By: Claude Opus 4.5 <[email protected]>
Introduces a Data Provider Interface pattern to cleanly abstract data loading from the source (API vs parquet-wasm). This enables future self-contained HTML snapshots without divergent code paths. New files: - data-provider.js: Interface definition and provider management - api-provider.js: HTTP fetch implementation (current behavior) - parquet-provider.js: Skeleton for parquet-wasm (to be implemented) The effects.js module now imports Api from data-provider.js and initializes the default ApiProvider. Effect handlers are unchanged. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Use hash-based ID generation to ensure each container has a unique 12-character prefix (short_id). The previous approach produced IDs with many leading zeros, causing all containers to share the same short_id and breaking multi-container selection in the viewer. Co-Authored-By: Claude Opus 4.5 <[email protected]>
…CON-007) Problem: Consolidated parquet files caused errors in DuckDB and PyArrow: - DuckDB: "don't know what type:" - PyArrow: "Column cannot have more than one dictionary" Root Cause: Arrow's Parquet writer creates a new dictionary for each row group. When consolidating multiple input files: 1. Each input file has its own row group(s) 2. Each row group gets its own dictionary for dictionary-encoded columns 3. The resulting file has multiple dictionaries per column 4. DuckDB/PyArrow expect at most ONE dictionary per column Solution: Set max_row_group_size to usize::MAX in WriterProperties. This forces all batches from all input files into a SINGLE row group, resulting in ONE dictionary per column. Memory remains bounded because ArrowWriter encodes data incrementally - it doesn't buffer raw Arrow arrays, just the encoded Parquet data. Trade-off: We lose row-group-level predicate pushdown (skipping entire row groups based on min/max statistics). For time-series analysis of specific containers, this is acceptable since queries typically target narrow time ranges within files anyway. Tested: - Consolidated 10 files (~5M rows) successfully - DuckDB queries work without errors - Verified single row group in parquet metadata Co-Authored-By: Claude Opus 4.5 <[email protected]>
…lity - Reduce leak rate from 1MB to 512KB per 2 seconds - Increase memory limit from 64Mi to 256Mi - OOM now occurs after ~8-10 minutes instead of ~2 minutes - Fix MB calculation in log output - Makes changepoint detection easier to observe Co-Authored-By: Claude Opus 4.5 <[email protected]>
Checkpoint commit - work in progress. Adds benchmarks for: - Index queries (list_metrics, list_containers) - Study algorithms (periodicity, changepoint) - MCP patterns (analyze_container, summarize_container) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Implement PELT (Pruned Exact Linear Time) algorithm for changepoint detection, replacing the augurs/BOCPD dependency which had O(n²) complexity and would timeout on large datasets. PELT provides: - O(n) expected time complexity via dynamic programming with pruning - O(1) segment cost calculation using cumulative sums - BIC-like penalty for automatic changepoint count selection - No external dependencies (pure Rust implementation) Performance improvement on 28,800 point dataset: - BOCPD: >180s timeout (never completed) - PELT: 72-127ms Based on: Killick, R., Fearnhead, P., & Eckley, I. A. (2012). "Optimal detection of changepoints with a linear computational cost." Co-Authored-By: Claude Opus 4.5 <[email protected]>
Three key optimizations to reduce memory allocation churn: 1. HashMap/HashSet pre-sizing: Add with_capacity() calls throughout - load_from_sidecars, scan_metadata, refresh_containers_from_sidecars - load_metric_data, load_row_groups - Eliminates ~111 rehash operations per query 2. Vector pre-sizing: Add Vec::with_capacity() for: - dates_to_scan in discover_files_by_time_range - result/missing vectors in get_timeseries - containers vector in get_containers_by_recency - Eliminates raw_vec::finish_grow reallocs 3. Arc<str> string interning for container IDs: - Create interner at query start with all requested container IDs - Pass through load_metric_from_file -> load_row_groups - Use Arc::clone() (pointer increment) instead of String::clone() - Eliminates ~3,447 String allocations per query - Convert back to String only at API boundary Memory profiling showed: - String::clone: 102KB across 3,447 calls (FIXED) - raw_vec::finish_grow: 1.4MB reallocs (FIXED) - RawTable::reserve_rehash: 90KB across 111 calls (FIXED) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add alloc-profile binary that uses a counting allocator to measure allocation behavior during get_timeseries hot path. This tool helps quantify the impact of memory optimizations. Usage: cargo run --release --bin alloc-profile The tool: - Exercises the get_timeseries hot path with 10 containers - Measures allocation count and bytes per iteration - Outputs machine-readable summary (ALLOC_COUNT, ALLOC_BYTES, etc.) Results show memory optimizations reduced allocation count by 72%: - Before: 400,597 allocations per iteration - After: 112,482 allocations per iteration Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add deterministic selection for metrics and containers in benchmarks: - Sort parquet files by path - Sort metrics alphabetically before selecting - Sort containers by ID instead of recency This eliminates variability from HashMap iteration order and recency-based sorting that caused wild benchmark swings (e.g., study_changepoint ranged from 14ms to 2.3s between runs). Results are now reproducible within ~1% variance. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Static quality checks✅ Please find below the results from static quality gates Successful checksInfo
3 successful checks with minimal change (< 2 KiB)
On-wire sizes (compressed)
|
Regression DetectorRegression Detector ResultsMetrics dashboard Baseline: 49aeaf1 Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | -0.37 | [-3.34, +2.61] | 1 | Logs |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | quality_gate_metrics_logs | memory utilization | +0.54 | [+0.33, +0.74] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_idle_all_features | memory utilization | +0.32 | [+0.29, +0.36] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_idle | memory utilization | +0.21 | [+0.16, +0.25] | 1 | Logs bounds checks dashboard |
| ➖ | ddot_metrics_sum_cumulative | memory utilization | +0.15 | [-0.01, +0.32] | 1 | Logs |
| ➖ | file_to_blackhole_500ms_latency | egress throughput | +0.10 | [-0.28, +0.48] | 1 | Logs |
| ➖ | uds_dogstatsd_20mb_12k_contexts_20_senders | memory utilization | +0.08 | [+0.02, +0.14] | 1 | Logs |
| ➖ | file_to_blackhole_100ms_latency | egress throughput | +0.04 | [-0.00, +0.09] | 1 | Logs |
| ➖ | file_to_blackhole_1000ms_latency | egress throughput | +0.03 | [-0.39, +0.45] | 1 | Logs |
| ➖ | file_tree | memory utilization | +0.02 | [-0.04, +0.07] | 1 | Logs |
| ➖ | tcp_dd_logs_filter_exclude | ingress throughput | +0.01 | [-0.07, +0.09] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api | ingress throughput | -0.00 | [-0.14, +0.14] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api_v3 | ingress throughput | -0.00 | [-0.12, +0.12] | 1 | Logs |
| ➖ | file_to_blackhole_0ms_latency | egress throughput | -0.02 | [-0.43, +0.39] | 1 | Logs |
| ➖ | otlp_ingest_metrics | memory utilization | -0.08 | [-0.23, +0.08] | 1 | Logs |
| ➖ | docker_containers_memory | memory utilization | -0.11 | [-0.18, -0.04] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulativetodelta_exporter | memory utilization | -0.21 | [-0.44, +0.02] | 1 | Logs |
| ➖ | quality_gate_logs | % cpu utilization | -0.32 | [-1.80, +1.16] | 1 | Logs bounds checks dashboard |
| ➖ | otlp_ingest_logs | memory utilization | -0.33 | [-0.43, -0.24] | 1 | Logs |
| ➖ | docker_containers_cpu | % cpu utilization | -0.37 | [-3.34, +2.61] | 1 | Logs |
| ➖ | ddot_metrics_sum_delta | memory utilization | -0.62 | [-0.82, -0.43] | 1 | Logs |
| ➖ | ddot_metrics | memory utilization | -0.65 | [-0.85, -0.45] | 1 | Logs |
| ➖ | ddot_logs | memory utilization | -1.25 | [-1.32, -1.18] | 1 | Logs |
| ➖ | tcp_syslog_to_blackhole | ingress throughput | -3.39 | [-3.47, -3.31] | 1 | Logs |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | links |
|---|---|---|---|---|
| ✅ | docker_containers_cpu | simple_check_run | 10/10 | |
| ✅ | docker_containers_memory | memory_usage | 10/10 | |
| ✅ | docker_containers_memory | simple_check_run | 10/10 | |
| ✅ | file_to_blackhole_0ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_0ms_latency | memory_usage | 10/10 | |
| ✅ | file_to_blackhole_1000ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_1000ms_latency | memory_usage | 10/10 | |
| ✅ | file_to_blackhole_100ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_100ms_latency | memory_usage | 10/10 | |
| ✅ | file_to_blackhole_500ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_500ms_latency | memory_usage | 10/10 | |
| ✅ | quality_gate_idle | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_idle | memory_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | memory_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_logs | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_logs | lost_bytes | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_logs | memory_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | cpu_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | lost_bytes | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | memory_usage | 10/10 | bounds checks dashboard |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
CI Pass/Fail Decision
✅ Passed. All Quality Gates passed.
- quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
What does this PR do?
Motivation
Describe how you validated your changes
Additional Notes