Component
Harnesses
Desired use case or feature
The FMA benchmarking doc (llm-d-fast-model-actuation PR #454) defines new metrics that the nop harness should collect for FMA actuation paths:
- Hot_hit_rate: Fraction of server-requesting Pods satisfied by waking a sleeping vLLM instance
- Warm_hit_rate: Fraction satisfied by an existing launcher pod (no new launcher creation needed)
- T_wake: Hot-start timing (requester creation to ready)
- T_instance_create: Warm-start timing (launcher receiving create request to DPC readiness relay)
- T_cold_launcher: Cold-start-with-launcher timing (launcher pod creation to DPC readiness relay)
The current fma_functions.py classifies actuation paths using a pod-name heuristic (TODO: Improve the warm/luke_warm check). This needs to be replaced with timestamp-based classification.
Proposed solution
This work is a single combined PR covering Phase 1 (classification + hit rates) and Phase 2 (per-path timing). It opens after the FMA repo logging prerequisites (#495, #497) are merged.
Phase 1: Classification and hit rates
- Rename
FMAActuationCondition.T_LUKE_WARM to T_COLD_LAUNCHER (aligns with updated FMA terminology)
- Store
launcher_creation_timestamp in FMALauncherInfo
- Replace pod-name heuristic with timestamp comparison:
- HOT: sleep/wake metrics indicate a wake event
- WARM: launcher pod
creationTimestamp < requester pod creationTimestamp (pre-existing launcher)
- COLD_LAUNCHER: launcher created after requester (DPC had to create a new launcher)
- Compute
hot_hit_rate, warm_hit_rate, cold_launcher_rate per iteration in FMAMetricsIteration
Phase 2: Per-path timing (upper bound via Kube timestamps)
5. Add t_wake, t_instance_create, t_cold_launcher fields to FMALauncherInfo
6. Compute upper-bound timing:
- T_wake:
requester_ready - requester_creation (hot path)
- T_instance_create:
requester_ready - requester_creation (warm path)
- T_cold_launcher:
requester_ready - launcher_creation (cold path)
These are upper bounds (include kubelet polling delay). Tighter measurements via DPC log parsing are a future follow-up once #495/#497 logs are deployed and available on test clusters.
Success criteria:
- JSON output includes
actuation_condition with values T_cold_launcher, T_warm, or T_hot
- Each iteration reports
hot_hit_rate, warm_hit_rate, cold_launcher_rate
- Per-launcher info includes the relevant timing metric (
t_wake, t_instance_create, or t_cold_launcher)
- Classification is verified against each actuation path config (hot, warm, cold with launcher)
Alternatives
- Pod-name heuristic (current approach): fragile, breaks if naming conventions change
- Separate Phase 1 and Phase 2 into two PRs: unnecessary complexity for reviewers given the shared data structures
Additional context or screenshots
Component
Harnesses
Desired use case or feature
The FMA benchmarking doc (llm-d-fast-model-actuation PR #454) defines new metrics that the nop harness should collect for FMA actuation paths:
The current
fma_functions.pyclassifies actuation paths using a pod-name heuristic (TODO: Improve the warm/luke_warm check). This needs to be replaced with timestamp-based classification.Proposed solution
This work is a single combined PR covering Phase 1 (classification + hit rates) and Phase 2 (per-path timing). It opens after the FMA repo logging prerequisites (#495, #497) are merged.
Phase 1: Classification and hit rates
FMAActuationCondition.T_LUKE_WARMtoT_COLD_LAUNCHER(aligns with updated FMA terminology)launcher_creation_timestampinFMALauncherInfocreationTimestamp< requester podcreationTimestamp(pre-existing launcher)hot_hit_rate,warm_hit_rate,cold_launcher_rateper iteration inFMAMetricsIterationPhase 2: Per-path timing (upper bound via Kube timestamps)
5. Add
t_wake,t_instance_create,t_cold_launcherfields toFMALauncherInfo6. Compute upper-bound timing:
requester_ready - requester_creation(hot path)requester_ready - requester_creation(warm path)requester_ready - launcher_creation(cold path)These are upper bounds (include kubelet polling delay). Tighter measurements via DPC log parsing are a future follow-up once #495/#497 logs are deployed and available on test clusters.
Success criteria:
actuation_conditionwith valuesT_cold_launcher,T_warm, orT_hothot_hit_rate,warm_hit_rate,cold_launcher_ratet_wake,t_instance_create, ort_cold_launcher)Alternatives
Additional context or screenshots