Skip to content

[Feature]: FMA actuation path classification, hit rates, and per-path timing (Phases 1-2) #1422

@rubambiza

Description

@rubambiza

Component

Harnesses

Desired use case or feature

The FMA benchmarking doc (llm-d-fast-model-actuation PR #454) defines new metrics that the nop harness should collect for FMA actuation paths:

  • Hot_hit_rate: Fraction of server-requesting Pods satisfied by waking a sleeping vLLM instance
  • Warm_hit_rate: Fraction satisfied by an existing launcher pod (no new launcher creation needed)
  • T_wake: Hot-start timing (requester creation to ready)
  • T_instance_create: Warm-start timing (launcher receiving create request to DPC readiness relay)
  • T_cold_launcher: Cold-start-with-launcher timing (launcher pod creation to DPC readiness relay)

The current fma_functions.py classifies actuation paths using a pod-name heuristic (TODO: Improve the warm/luke_warm check). This needs to be replaced with timestamp-based classification.

Proposed solution

This work is a single combined PR covering Phase 1 (classification + hit rates) and Phase 2 (per-path timing). It opens after the FMA repo logging prerequisites (#495, #497) are merged.

Phase 1: Classification and hit rates

  1. Rename FMAActuationCondition.T_LUKE_WARM to T_COLD_LAUNCHER (aligns with updated FMA terminology)
  2. Store launcher_creation_timestamp in FMALauncherInfo
  3. Replace pod-name heuristic with timestamp comparison:
    • HOT: sleep/wake metrics indicate a wake event
    • WARM: launcher pod creationTimestamp < requester pod creationTimestamp (pre-existing launcher)
    • COLD_LAUNCHER: launcher created after requester (DPC had to create a new launcher)
  4. Compute hot_hit_rate, warm_hit_rate, cold_launcher_rate per iteration in FMAMetricsIteration

Phase 2: Per-path timing (upper bound via Kube timestamps)
5. Add t_wake, t_instance_create, t_cold_launcher fields to FMALauncherInfo
6. Compute upper-bound timing:

  • T_wake: requester_ready - requester_creation (hot path)
  • T_instance_create: requester_ready - requester_creation (warm path)
  • T_cold_launcher: requester_ready - launcher_creation (cold path)

These are upper bounds (include kubelet polling delay). Tighter measurements via DPC log parsing are a future follow-up once #495/#497 logs are deployed and available on test clusters.

Success criteria:

  • JSON output includes actuation_condition with values T_cold_launcher, T_warm, or T_hot
  • Each iteration reports hot_hit_rate, warm_hit_rate, cold_launcher_rate
  • Per-launcher info includes the relevant timing metric (t_wake, t_instance_create, or t_cold_launcher)
  • Classification is verified against each actuation path config (hot, warm, cold with launcher)

Alternatives

  • Pod-name heuristic (current approach): fragile, breaks if naming conventions change
  • Separate Phase 1 and Phase 2 into two PRs: unnecessary complexity for reviewers given the shared data structures

Additional context or screenshots

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions