|
| 1 | +--- |
| 2 | +description: Profile memory and CPU usage during e2e tests and analyze results |
| 3 | +--- |
| 4 | + |
| 5 | +# E2E Profiling Plugin |
| 6 | + |
| 7 | +Analyze memory and CPU usage during e2e tests by collecting pprof heap and CPU profiles and generating comprehensive analysis reports. |
| 8 | + |
| 9 | +## Commands |
| 10 | + |
| 11 | +### /e2e-profile start [test-name] |
| 12 | + |
| 13 | +Start profiling in background mode (recommended workflow): |
| 14 | + |
| 15 | +1. Start port-forwards to operator-controller and catalogd |
| 16 | +2. Begin collecting heap and CPU profiles every 10 seconds |
| 17 | +3. Run in background, allowing you to run any test command |
| 18 | +4. Auto-detect cluster teardown and stop gracefully |
| 19 | +5. Use `/e2e-profile stop` to finish and analyze |
| 20 | + |
| 21 | +**Examples:** |
| 22 | +``` |
| 23 | +/e2e-profile start baseline |
| 24 | +# Then run: make test-e2e |
| 25 | +# Then run: /e2e-profile stop |
| 26 | +``` |
| 27 | + |
| 28 | +This workflow: |
| 29 | +- Works with ANY test command (make test-e2e, make test-experimental-e2e, custom commands) |
| 30 | +- Handles cluster teardown gracefully (test-e2e tears down cluster) |
| 31 | +- Auto-stops after 3 consecutive collection failures |
| 32 | +- Lets you run tests your way |
| 33 | + |
| 34 | +### /e2e-profile stop |
| 35 | + |
| 36 | +Stop background profiling session and generate analysis: |
| 37 | + |
| 38 | +1. Stop profile collection process |
| 39 | +2. Kill port-forward processes (or detect they're already stopped) |
| 40 | +3. Clean up empty profile files |
| 41 | +4. Generate comprehensive analysis report |
| 42 | + |
| 43 | +**Example:** |
| 44 | +``` |
| 45 | +/e2e-profile stop |
| 46 | +``` |
| 47 | + |
| 48 | +### /e2e-profile run [test-name] [test-target] |
| 49 | + |
| 50 | +Run an e2e test with continuous memory and CPU profiling (automated workflow): |
| 51 | + |
| 52 | +1. Start the specified e2e test (defaults to `make test-experimental-e2e`) |
| 53 | +2. Wait for the operator-controller pod to be ready |
| 54 | +3. Collect heap and CPU profiles every 10 seconds to `./e2e-profiles/[test-name]/` |
| 55 | +4. Continue until the test completes or is interrupted |
| 56 | +5. Generate a summary report with memory and CPU analysis |
| 57 | + |
| 58 | +**Test Targets:** |
| 59 | +- `test-e2e` - Standard e2e tests |
| 60 | +- `test-experimental-e2e` - Experimental e2e tests (default) |
| 61 | +- `test-extension-developer-e2e` - Extension developer e2e tests |
| 62 | +- `test-upgrade-e2e` - Upgrade e2e tests |
| 63 | +- `test-upgrade-experimental-e2e` - Upgrade experimental e2e tests |
| 64 | + |
| 65 | +**Examples:** |
| 66 | +``` |
| 67 | +/e2e-profile run baseline |
| 68 | +/e2e-profile run baseline test-e2e |
| 69 | +/e2e-profile run with-caching test-experimental-e2e |
| 70 | +/e2e-profile run upgrade-test test-upgrade-e2e |
| 71 | +``` |
| 72 | + |
| 73 | +### /e2e-profile analyze [test-name] |
| 74 | + |
| 75 | +Analyze collected heap profiles for a specific test run: |
| 76 | + |
| 77 | +1. Load all heap profiles from `./e2e-profiles/[test-name]/` |
| 78 | +2. Analyze memory growth patterns |
| 79 | +3. Identify top allocators |
| 80 | +4. Find OpenAPI, JSON, and other hotspots |
| 81 | +5. Generate detailed markdown report |
| 82 | + |
| 83 | +**Example:** |
| 84 | +``` |
| 85 | +/e2e-profile analyze baseline |
| 86 | +``` |
| 87 | + |
| 88 | +### /e2e-profile compare [test1] [test2] |
| 89 | + |
| 90 | +Compare two test runs to measure the impact of changes: |
| 91 | + |
| 92 | +1. Load profiles from both test runs |
| 93 | +2. Compare peak memory usage |
| 94 | +3. Compare memory growth rates |
| 95 | +4. Identify differences in allocation patterns |
| 96 | +5. Generate side-by-side comparison report with charts |
| 97 | + |
| 98 | +**Example:** |
| 99 | +``` |
| 100 | +/e2e-profile compare baseline with-caching |
| 101 | +``` |
| 102 | + |
| 103 | +### /e2e-profile collect |
| 104 | + |
| 105 | +Manually collect a single heap profile from the running operator-controller pod: |
| 106 | + |
| 107 | +1. Find the operator-controller pod |
| 108 | +2. Set up port forwarding to pprof endpoint |
| 109 | +3. Download heap profile |
| 110 | +4. Save to `./e2e-profiles/manual/heap-[timestamp].pprof` |
| 111 | + |
| 112 | +**Example:** |
| 113 | +``` |
| 114 | +/e2e-profile collect |
| 115 | +``` |
| 116 | + |
| 117 | +## Task Breakdown |
| 118 | + |
| 119 | +When you invoke this command, I will: |
| 120 | + |
| 121 | +1. **Setup Phase** |
| 122 | + - Create `./e2e-profiles/[test-name]` directory |
| 123 | + - Verify `make test-experimental-e2e` is available |
| 124 | + - Check kubectl access to the cluster |
| 125 | + |
| 126 | +2. **Collection Phase** |
| 127 | + - Start the e2e test in background |
| 128 | + - Monitor for pod readiness |
| 129 | + - Set up port forwarding to pprof endpoint (port 6060) |
| 130 | + - Collect heap profiles every 10 seconds |
| 131 | + - Save profiles with sequential naming (heap0.pprof, heap1.pprof, ...) |
| 132 | + |
| 133 | +3. **Monitoring Phase** |
| 134 | + - Track test progress |
| 135 | + - Monitor profile file sizes for growth patterns |
| 136 | + - Detect if test crashes or completes |
| 137 | + |
| 138 | +4. **Analysis Phase** |
| 139 | + - Use `go tool pprof` to analyze profiles |
| 140 | + - Extract key metrics: |
| 141 | + - Peak memory usage |
| 142 | + - Memory growth over time |
| 143 | + - Top allocators |
| 144 | + - OpenAPI-related allocations |
| 145 | + - JSON deserialization overhead |
| 146 | + - Informer/cache allocations |
| 147 | + |
| 148 | +5. **Reporting Phase** |
| 149 | + - Generate markdown report with: |
| 150 | + - Executive summary |
| 151 | + - Memory timeline chart |
| 152 | + - Top allocators table |
| 153 | + - Allocation breakdown |
| 154 | + - Recommendations for optimization |
| 155 | + |
| 156 | +## Configuration |
| 157 | + |
| 158 | +The plugin uses these defaults (customizable via environment variables): |
| 159 | + |
| 160 | +```bash |
| 161 | +# Namespace where operator-controller runs |
| 162 | +E2E_PROFILE_NAMESPACE=olmv1-system |
| 163 | + |
| 164 | +# Collection interval in seconds |
| 165 | +E2E_PROFILE_INTERVAL=10 |
| 166 | + |
| 167 | +# CPU sampling duration in seconds |
| 168 | +E2E_PROFILE_CPU_DURATION=10 |
| 169 | + |
| 170 | +# Profile collection mode (both, heap, cpu) |
| 171 | +E2E_PROFILE_MODE=both |
| 172 | + |
| 173 | +# Output directory base |
| 174 | +E2E_PROFILE_DIR=./e2e-profiles |
| 175 | + |
| 176 | +# Default test target |
| 177 | +E2E_PROFILE_TEST_TARGET=test-experimental-e2e |
| 178 | +``` |
| 179 | + |
| 180 | +**Profile Modes:** |
| 181 | +- `both` (default): Collect both heap and CPU profiles |
| 182 | +- `heap`: Collect only heap profiles (reduces overhead by ~3%) |
| 183 | +- `cpu`: Collect only CPU profiles |
| 184 | + |
| 185 | +## Output Structure |
| 186 | + |
| 187 | +``` |
| 188 | +e2e-profiles/ |
| 189 | +├── baseline/ |
| 190 | +│ ├── operator-controller/ |
| 191 | +│ │ ├── heap0.pprof |
| 192 | +│ │ ├── heap1.pprof |
| 193 | +│ │ ├── cpu0.pprof |
| 194 | +│ │ ├── cpu1.pprof |
| 195 | +│ │ └── ... |
| 196 | +│ ├── catalogd/ |
| 197 | +│ │ ├── heap0.pprof |
| 198 | +│ │ ├── cpu0.pprof |
| 199 | +│ │ └── ... |
| 200 | +│ ├── test.log |
| 201 | +│ ├── collection.log |
| 202 | +│ └── analysis.md |
| 203 | +├── with-caching/ |
| 204 | +│ └── ... |
| 205 | +└── comparisons/ |
| 206 | + └── baseline-vs-with-caching.md |
| 207 | +``` |
| 208 | + |
| 209 | +## Tool Location |
| 210 | + |
| 211 | +The memory profiling scripts are located at: |
| 212 | +``` |
| 213 | +hack/tools/e2e-profiling/ |
| 214 | +├── e2e-profile.sh # Main entry point |
| 215 | +├── start-profiling.sh # Start background profiling |
| 216 | +├── stop-profiling.sh # Stop profiling and analyze |
| 217 | +├── run-profiled-test.sh # Run test with profiling (automated) |
| 218 | +├── collect-profiles.sh # Profile collection loop |
| 219 | +├── analyze-profiles.sh # Generate analysis reports |
| 220 | +├── compare-profiles.sh # Compare two runs |
| 221 | +├── common.sh # Shared utilities |
| 222 | +└── README.md # Full documentation |
| 223 | +``` |
| 224 | + |
| 225 | +You can run them directly: |
| 226 | +```bash |
| 227 | +# Start/Stop workflow |
| 228 | +make start-profiling # or ./hack/tools/e2e-profiling/start-profiling.sh |
| 229 | +make test-e2e |
| 230 | +make stop-profiling # or ./hack/tools/e2e-profiling/stop-profiling.sh |
| 231 | + |
| 232 | +# Automated workflow |
| 233 | +./hack/tools/e2e-profiling/e2e-profile.sh run baseline |
| 234 | +./hack/tools/e2e-profiling/e2e-profile.sh analyze baseline |
| 235 | +./hack/tools/e2e-profiling/e2e-profile.sh compare baseline optimized |
| 236 | +``` |
| 237 | + |
| 238 | +## Requirements |
| 239 | + |
| 240 | +- kubectl with access to the cluster |
| 241 | +- go tool pprof |
| 242 | +- make (for running tests) |
| 243 | +- curl (for fetching profiles) |
| 244 | +- Port 6060 available for forwarding |
| 245 | + |
| 246 | +## Example Workflows |
| 247 | + |
| 248 | +### Recommended: Start/Stop Workflow |
| 249 | + |
| 250 | +```bash |
| 251 | +# 1. Start profiling in background |
| 252 | +/e2e-profile start baseline |
| 253 | + |
| 254 | +# 2. Run your test (any command!) |
| 255 | +make test-e2e # Works! Handles cluster teardown |
| 256 | +make test-experimental-e2e # Works! |
| 257 | +go test ./test/e2e/... # Works! |
| 258 | + |
| 259 | +# 3. Stop profiling and get analysis |
| 260 | +/e2e-profile stop |
| 261 | + |
| 262 | +# 4. Make code changes and test again |
| 263 | +# ... edit code ... |
| 264 | +/e2e-profile start optimized |
| 265 | +make test-e2e |
| 266 | +/e2e-profile stop |
| 267 | + |
| 268 | +# 5. Compare results |
| 269 | +/e2e-profile compare baseline optimized |
| 270 | +``` |
| 271 | + |
| 272 | +### Alternative: Automated Workflow |
| 273 | + |
| 274 | +```bash |
| 275 | +# 1. Run baseline test with profiling (automated) |
| 276 | +/e2e-profile run baseline |
| 277 | + |
| 278 | +# 2. Make code changes (e.g., add caching) |
| 279 | +# ... edit code ... |
| 280 | + |
| 281 | +# 3. Run new test with profiling |
| 282 | +/e2e-profile run with-caching |
| 283 | + |
| 284 | +# 4. Compare results |
| 285 | +/e2e-profile compare baseline with-caching |
| 286 | + |
| 287 | +# 5. Review the comparison report |
| 288 | +# Opens: e2e-profiles/comparisons/baseline-vs-with-caching.md |
| 289 | +``` |
| 290 | + |
| 291 | +## Notes |
| 292 | + |
| 293 | +**Start/Stop Workflow:** |
| 294 | +- Profiler runs in background, letting you run any test command |
| 295 | +- Auto-detects cluster teardown after 3 consecutive collection failures |
| 296 | +- Port-forwards and collection process stop gracefully |
| 297 | +- Works with test-e2e (which tears down cluster), test-experimental-e2e, and custom commands |
| 298 | + |
| 299 | +**Automated Workflow:** |
| 300 | +- Test will run until completion or manual interruption (Ctrl+C) |
| 301 | +- Automatically handles profiling setup and teardown |
| 302 | + |
| 303 | +**General:** |
| 304 | +- Each heap profile is ~11-150KB depending on memory usage |
| 305 | +- Each CPU profile is ~4-40KB depending on activity |
| 306 | +- Analysis requires all profile files to be present |
| 307 | +- Port forwarding uses deployments (survives pod restarts) |
| 308 | +- Reports are generated in markdown format for easy viewing |
| 309 | +- Empty profile files are automatically cleaned up |
0 commit comments