Skip to content

Commit d2ae505

Browse files
tmshortclaude
andcommitted
Add Claude Code integration for e2e profiling
Add `/e2e-profile` slash command to enable interactive profiling workflow through Claude Code interface. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Todd Short <[email protected]>
1 parent 18142b3 commit d2ae505

File tree

1 file changed

+309
-0
lines changed

1 file changed

+309
-0
lines changed

.claude/commands/e2e-profile.md

Lines changed: 309 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,309 @@
1+
---
2+
description: Profile memory and CPU usage during e2e tests and analyze results
3+
---
4+
5+
# E2E Profiling Plugin
6+
7+
Analyze memory and CPU usage during e2e tests by collecting pprof heap and CPU profiles and generating comprehensive analysis reports.
8+
9+
## Commands
10+
11+
### /e2e-profile start [test-name]
12+
13+
Start profiling in background mode (recommended workflow):
14+
15+
1. Start port-forwards to operator-controller and catalogd
16+
2. Begin collecting heap and CPU profiles every 10 seconds
17+
3. Run in background, allowing you to run any test command
18+
4. Auto-detect cluster teardown and stop gracefully
19+
5. Use `/e2e-profile stop` to finish and analyze
20+
21+
**Examples:**
22+
```
23+
/e2e-profile start baseline
24+
# Then run: make test-e2e
25+
# Then run: /e2e-profile stop
26+
```
27+
28+
This workflow:
29+
- Works with ANY test command (make test-e2e, make test-experimental-e2e, custom commands)
30+
- Handles cluster teardown gracefully (test-e2e tears down cluster)
31+
- Auto-stops after 3 consecutive collection failures
32+
- Lets you run tests your way
33+
34+
### /e2e-profile stop
35+
36+
Stop background profiling session and generate analysis:
37+
38+
1. Stop profile collection process
39+
2. Kill port-forward processes (or detect they're already stopped)
40+
3. Clean up empty profile files
41+
4. Generate comprehensive analysis report
42+
43+
**Example:**
44+
```
45+
/e2e-profile stop
46+
```
47+
48+
### /e2e-profile run [test-name] [test-target]
49+
50+
Run an e2e test with continuous memory and CPU profiling (automated workflow):
51+
52+
1. Start the specified e2e test (defaults to `make test-experimental-e2e`)
53+
2. Wait for the operator-controller pod to be ready
54+
3. Collect heap and CPU profiles every 10 seconds to `./e2e-profiles/[test-name]/`
55+
4. Continue until the test completes or is interrupted
56+
5. Generate a summary report with memory and CPU analysis
57+
58+
**Test Targets:**
59+
- `test-e2e` - Standard e2e tests
60+
- `test-experimental-e2e` - Experimental e2e tests (default)
61+
- `test-extension-developer-e2e` - Extension developer e2e tests
62+
- `test-upgrade-e2e` - Upgrade e2e tests
63+
- `test-upgrade-experimental-e2e` - Upgrade experimental e2e tests
64+
65+
**Examples:**
66+
```
67+
/e2e-profile run baseline
68+
/e2e-profile run baseline test-e2e
69+
/e2e-profile run with-caching test-experimental-e2e
70+
/e2e-profile run upgrade-test test-upgrade-e2e
71+
```
72+
73+
### /e2e-profile analyze [test-name]
74+
75+
Analyze collected heap profiles for a specific test run:
76+
77+
1. Load all heap profiles from `./e2e-profiles/[test-name]/`
78+
2. Analyze memory growth patterns
79+
3. Identify top allocators
80+
4. Find OpenAPI, JSON, and other hotspots
81+
5. Generate detailed markdown report
82+
83+
**Example:**
84+
```
85+
/e2e-profile analyze baseline
86+
```
87+
88+
### /e2e-profile compare [test1] [test2]
89+
90+
Compare two test runs to measure the impact of changes:
91+
92+
1. Load profiles from both test runs
93+
2. Compare peak memory usage
94+
3. Compare memory growth rates
95+
4. Identify differences in allocation patterns
96+
5. Generate side-by-side comparison report with charts
97+
98+
**Example:**
99+
```
100+
/e2e-profile compare baseline with-caching
101+
```
102+
103+
### /e2e-profile collect
104+
105+
Manually collect a single heap profile from the running operator-controller pod:
106+
107+
1. Find the operator-controller pod
108+
2. Set up port forwarding to pprof endpoint
109+
3. Download heap profile
110+
4. Save to `./e2e-profiles/manual/heap-[timestamp].pprof`
111+
112+
**Example:**
113+
```
114+
/e2e-profile collect
115+
```
116+
117+
## Task Breakdown
118+
119+
When you invoke this command, I will:
120+
121+
1. **Setup Phase**
122+
- Create `./e2e-profiles/[test-name]` directory
123+
- Verify `make test-experimental-e2e` is available
124+
- Check kubectl access to the cluster
125+
126+
2. **Collection Phase**
127+
- Start the e2e test in background
128+
- Monitor for pod readiness
129+
- Set up port forwarding to pprof endpoint (port 6060)
130+
- Collect heap profiles every 10 seconds
131+
- Save profiles with sequential naming (heap0.pprof, heap1.pprof, ...)
132+
133+
3. **Monitoring Phase**
134+
- Track test progress
135+
- Monitor profile file sizes for growth patterns
136+
- Detect if test crashes or completes
137+
138+
4. **Analysis Phase**
139+
- Use `go tool pprof` to analyze profiles
140+
- Extract key metrics:
141+
- Peak memory usage
142+
- Memory growth over time
143+
- Top allocators
144+
- OpenAPI-related allocations
145+
- JSON deserialization overhead
146+
- Informer/cache allocations
147+
148+
5. **Reporting Phase**
149+
- Generate markdown report with:
150+
- Executive summary
151+
- Memory timeline chart
152+
- Top allocators table
153+
- Allocation breakdown
154+
- Recommendations for optimization
155+
156+
## Configuration
157+
158+
The plugin uses these defaults (customizable via environment variables):
159+
160+
```bash
161+
# Namespace where operator-controller runs
162+
E2E_PROFILE_NAMESPACE=olmv1-system
163+
164+
# Collection interval in seconds
165+
E2E_PROFILE_INTERVAL=10
166+
167+
# CPU sampling duration in seconds
168+
E2E_PROFILE_CPU_DURATION=10
169+
170+
# Profile collection mode (both, heap, cpu)
171+
E2E_PROFILE_MODE=both
172+
173+
# Output directory base
174+
E2E_PROFILE_DIR=./e2e-profiles
175+
176+
# Default test target
177+
E2E_PROFILE_TEST_TARGET=test-experimental-e2e
178+
```
179+
180+
**Profile Modes:**
181+
- `both` (default): Collect both heap and CPU profiles
182+
- `heap`: Collect only heap profiles (reduces overhead by ~3%)
183+
- `cpu`: Collect only CPU profiles
184+
185+
## Output Structure
186+
187+
```
188+
e2e-profiles/
189+
├── baseline/
190+
│ ├── operator-controller/
191+
│ │ ├── heap0.pprof
192+
│ │ ├── heap1.pprof
193+
│ │ ├── cpu0.pprof
194+
│ │ ├── cpu1.pprof
195+
│ │ └── ...
196+
│ ├── catalogd/
197+
│ │ ├── heap0.pprof
198+
│ │ ├── cpu0.pprof
199+
│ │ └── ...
200+
│ ├── test.log
201+
│ ├── collection.log
202+
│ └── analysis.md
203+
├── with-caching/
204+
│ └── ...
205+
└── comparisons/
206+
└── baseline-vs-with-caching.md
207+
```
208+
209+
## Tool Location
210+
211+
The memory profiling scripts are located at:
212+
```
213+
hack/tools/e2e-profiling/
214+
├── e2e-profile.sh # Main entry point
215+
├── start-profiling.sh # Start background profiling
216+
├── stop-profiling.sh # Stop profiling and analyze
217+
├── run-profiled-test.sh # Run test with profiling (automated)
218+
├── collect-profiles.sh # Profile collection loop
219+
├── analyze-profiles.sh # Generate analysis reports
220+
├── compare-profiles.sh # Compare two runs
221+
├── common.sh # Shared utilities
222+
└── README.md # Full documentation
223+
```
224+
225+
You can run them directly:
226+
```bash
227+
# Start/Stop workflow
228+
make start-profiling # or ./hack/tools/e2e-profiling/start-profiling.sh
229+
make test-e2e
230+
make stop-profiling # or ./hack/tools/e2e-profiling/stop-profiling.sh
231+
232+
# Automated workflow
233+
./hack/tools/e2e-profiling/e2e-profile.sh run baseline
234+
./hack/tools/e2e-profiling/e2e-profile.sh analyze baseline
235+
./hack/tools/e2e-profiling/e2e-profile.sh compare baseline optimized
236+
```
237+
238+
## Requirements
239+
240+
- kubectl with access to the cluster
241+
- go tool pprof
242+
- make (for running tests)
243+
- curl (for fetching profiles)
244+
- Port 6060 available for forwarding
245+
246+
## Example Workflows
247+
248+
### Recommended: Start/Stop Workflow
249+
250+
```bash
251+
# 1. Start profiling in background
252+
/e2e-profile start baseline
253+
254+
# 2. Run your test (any command!)
255+
make test-e2e # Works! Handles cluster teardown
256+
make test-experimental-e2e # Works!
257+
go test ./test/e2e/... # Works!
258+
259+
# 3. Stop profiling and get analysis
260+
/e2e-profile stop
261+
262+
# 4. Make code changes and test again
263+
# ... edit code ...
264+
/e2e-profile start optimized
265+
make test-e2e
266+
/e2e-profile stop
267+
268+
# 5. Compare results
269+
/e2e-profile compare baseline optimized
270+
```
271+
272+
### Alternative: Automated Workflow
273+
274+
```bash
275+
# 1. Run baseline test with profiling (automated)
276+
/e2e-profile run baseline
277+
278+
# 2. Make code changes (e.g., add caching)
279+
# ... edit code ...
280+
281+
# 3. Run new test with profiling
282+
/e2e-profile run with-caching
283+
284+
# 4. Compare results
285+
/e2e-profile compare baseline with-caching
286+
287+
# 5. Review the comparison report
288+
# Opens: e2e-profiles/comparisons/baseline-vs-with-caching.md
289+
```
290+
291+
## Notes
292+
293+
**Start/Stop Workflow:**
294+
- Profiler runs in background, letting you run any test command
295+
- Auto-detects cluster teardown after 3 consecutive collection failures
296+
- Port-forwards and collection process stop gracefully
297+
- Works with test-e2e (which tears down cluster), test-experimental-e2e, and custom commands
298+
299+
**Automated Workflow:**
300+
- Test will run until completion or manual interruption (Ctrl+C)
301+
- Automatically handles profiling setup and teardown
302+
303+
**General:**
304+
- Each heap profile is ~11-150KB depending on memory usage
305+
- Each CPU profile is ~4-40KB depending on activity
306+
- Analysis requires all profile files to be present
307+
- Port forwarding uses deployments (survives pod restarts)
308+
- Reports are generated in markdown format for easy viewing
309+
- Empty profile files are automatically cleaned up

0 commit comments

Comments
 (0)