Skip to content

Refactor: Externalize Scheduler's saturation logic and criticality-based service differentiation #805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

LukeAVanDrie
Copy link
Contributor

@LukeAVanDrie LukeAVanDrie commented May 8, 2025

This commit refactors the request processing pipeline, externalizing saturation detection and criticality-based service differentiation from the Scheduler. These responsibilities are now primarily managed by the RequestControl.Director.

This change is a preparatory step for the introduction of a new Flow Controller component, which will eventually absorb these admission control duties.

Diff base is: #808 (split out for easier reviewing)
Related to: #674

Key changes include:

  • Introduced PreDispatch method to RequestControl.Director. It utilizes the SaturationDetector for admission control of non-critical requests and handles request criticality to determine if saturation checks are bypassed.
  • The saturation detection logic for dropping non-critical requests is intentionally preserved within the Director at this stage. This allows the option to bypass the future Flow Controller component during its maturation, ensuring the existing saturation and sheddable request behavior can be maintained as a fallback.
  • Updated main.go to instantiate the SaturationDetector, wiring it into the request handling flow.
  • Updated director_test.go to align with the new component responsibilities, adding additional coverage where necessary.

Missing from this PR:

  • Simplifying the Scheduler to focus solely on preference-based filtering and pod selection for requests that have already been admitted by the Director.
  • Removing the SheddableRequestFilter and the distinct critical/sheddable filter paths from the Scheduler's internal logic so that the Scheduler only applies a single, unified preference filter chain to all incoming requests.

I did not include the above in this PR due to high activity in those files. I will send a followup PR to address that. In the meantime, the saturation check happens twice: once in the Director, and then another redundant time in the Scheduler. This is wasted compute, but has no affect on behavior.

This refactoring leads to a cleaner architecture, making the Scheduler a more focused component and centralizing initial admission control logic, while paving the way for the future Flow Controller.

This is aligned with the direction in 0683-epp-architecture-proposal and is no-op in terms of EPP behavior.

Copy link

netlify bot commented May 8, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 22d4be0
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/683e0f86773b950008708b49
😎 Deploy Preview https://deploy-preview-805--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 8, 2025
@k8s-ci-robot k8s-ci-robot requested review from liu-cong and robscott May 8, 2025 20:26
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 8, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @LukeAVanDrie. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 8, 2025
@ahg-g
Copy link
Contributor

ahg-g commented May 8, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 8, 2025
@LukeAVanDrie
Copy link
Contributor Author

LukeAVanDrie commented May 8, 2025

This change should be no-op. @liu-cong, I will leave it up to your discretion whether this needs proper regression testing.

@LukeAVanDrie
Copy link
Contributor Author

LukeAVanDrie commented May 8, 2025

I split out the addition of the saturation detector subdir into a separate PR to be submitted before this one (#808 ). It is just unused until this PR gets submitted, wiring it up.

@LukeAVanDrie LukeAVanDrie force-pushed the saturation-detector branch from 112b943 to 48cc9a0 Compare May 8, 2025 20:51
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 8, 2025
@LukeAVanDrie LukeAVanDrie force-pushed the saturation-detector branch 3 times, most recently from a3d9090 to 9d273fa Compare May 9, 2025 02:49
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 9, 2025
@LukeAVanDrie LukeAVanDrie force-pushed the saturation-detector branch from 9d273fa to 83486ac Compare May 9, 2025 03:26
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 10, 2025
@LukeAVanDrie LukeAVanDrie force-pushed the saturation-detector branch from 83486ac to 4a7de3f Compare May 13, 2025 02:11
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 13, 2025
@LukeAVanDrie LukeAVanDrie force-pushed the saturation-detector branch from 4a7de3f to 44a11af Compare May 16, 2025 00:53
@k8s-ci-robot k8s-ci-robot removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 16, 2025
@LukeAVanDrie LukeAVanDrie force-pushed the saturation-detector branch from 40124ce to 228e58c Compare May 29, 2025 21:02
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 29, 2025
@LukeAVanDrie LukeAVanDrie force-pushed the saturation-detector branch from 228e58c to 5625947 Compare May 29, 2025 21:20
@LukeAVanDrie LukeAVanDrie force-pushed the saturation-detector branch 2 times, most recently from 119dda8 to da6537e Compare June 2, 2025 16:42
@liu-cong
Copy link
Contributor

liu-cong commented Jun 2, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 2, 2025
@ahg-g
Copy link
Contributor

ahg-g commented Jun 2, 2025

/assign

I will look at this today

Copy link
Contributor

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, one nit.

Please send a follow up PR to remove the logic we have in the plugins that currently does the shedding.

This commit refactors the request processing pipeline, externalizing
saturation detection and criticality-based service differentiation
from the Scheduler. These responsibilities are now primarily managed by
the RequestControl.Director.

This change is a preparatory step for the introduction of a new
Flow Controller component, which will eventually absorb these admission
control duties.

Key changes include:

- Introduced `PreDispatch` method to `RequestControl.Director` It
  utilizes the `SaturationDetector` for admission control of
  non-critical requests and handles request criticality to determine if
  saturation checks are bypassed.
- The saturation detection logic for dropping non-critical requests
  is intentionally preserved within the `Director` at this stage.
  This allows the option to bypass the future Flow Controller
  component during its maturation, ensuring the existing saturation
  and sheddable request behavior can be maintained as a fallback.
- Updated `main.go` to instantiate the `SaturationDetector`, wiring it
  into the request handling flow.
- Updated `director_test.go` to align with the new component
  responsibilities, adding additional coverage where necessary.

This refactoring leads to a cleaner architecture, making the `Scheduler`
a more focused component and centralizing initial admission control
logic while paving the way for the future Flow Controller.

This is aligned with the direction in `0683-epp-architecture-proposal`
and should be nearly no-op in terms of EPP behavior.
@LukeAVanDrie LukeAVanDrie force-pushed the saturation-detector branch from da6537e to 22d4be0 Compare June 2, 2025 20:54
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 2, 2025
@LukeAVanDrie LukeAVanDrie requested a review from ahg-g June 2, 2025 20:55
@ahg-g
Copy link
Contributor

ahg-g commented Jun 2, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 2, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, LukeAVanDrie

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 2, 2025
@k8s-ci-robot k8s-ci-robot merged commit 1d6a81f into kubernetes-sigs:main Jun 2, 2025
8 checks passed
LukeAVanDrie added a commit to LukeAVanDrie/gateway-api-inference-extension that referenced this pull request Jun 2, 2025
Admission control/capacity management is now handled in
`requestcontrol.Director.PreDispatch` (and soon to be absorbed into the
new Flow Controller). This should no longer be a responsibility of the
scheduling framework and this check is already being applied in kubernetes-sigs#805
prior to the scheduling layer being invoked.

This is not a no-op change. Previously, the `SheddableCapacityFilter`,
in addition to dropping sheddable requests when at capacity, would also
strictly filter the pods that the rest of the scheduling plugins would
consider as input. This change removes that strict filtering so all pods
are now considered so long as the system is not considered saturated.
This means sheddable requests now follow the same scheduling path as
critical requests provided they are not dropped by the saturation
detection check in `PreDispatch`.
irar2 pushed a commit to irar2/gateway-api-inference-extension that referenced this pull request Jun 3, 2025
This commit refactors the request processing pipeline, externalizing
saturation detection and criticality-based service differentiation
from the Scheduler. These responsibilities are now primarily managed by
the RequestControl.Director.

This change is a preparatory step for the introduction of a new
Flow Controller component, which will eventually absorb these admission
control duties.

Key changes include:

- Introduced `PreDispatch` method to `RequestControl.Director` It
  utilizes the `SaturationDetector` for admission control of
  non-critical requests and handles request criticality to determine if
  saturation checks are bypassed.
- The saturation detection logic for dropping non-critical requests
  is intentionally preserved within the `Director` at this stage.
  This allows the option to bypass the future Flow Controller
  component during its maturation, ensuring the existing saturation
  and sheddable request behavior can be maintained as a fallback.
- Updated `main.go` to instantiate the `SaturationDetector`, wiring it
  into the request handling flow.
- Updated `director_test.go` to align with the new component
  responsibilities, adding additional coverage where necessary.

This refactoring leads to a cleaner architecture, making the `Scheduler`
a more focused component and centralizing initial admission control
logic while paving the way for the future Flow Controller.

This is aligned with the direction in `0683-epp-architecture-proposal`
and should be nearly no-op in terms of EPP behavior.
shmuelk pushed a commit to shmuelk/gateway-api-inference-extension that referenced this pull request Jun 3, 2025
This commit refactors the request processing pipeline, externalizing
saturation detection and criticality-based service differentiation
from the Scheduler. These responsibilities are now primarily managed by
the RequestControl.Director.

This change is a preparatory step for the introduction of a new
Flow Controller component, which will eventually absorb these admission
control duties.

Key changes include:

- Introduced `PreDispatch` method to `RequestControl.Director` It
  utilizes the `SaturationDetector` for admission control of
  non-critical requests and handles request criticality to determine if
  saturation checks are bypassed.
- The saturation detection logic for dropping non-critical requests
  is intentionally preserved within the `Director` at this stage.
  This allows the option to bypass the future Flow Controller
  component during its maturation, ensuring the existing saturation
  and sheddable request behavior can be maintained as a fallback.
- Updated `main.go` to instantiate the `SaturationDetector`, wiring it
  into the request handling flow.
- Updated `director_test.go` to align with the new component
  responsibilities, adding additional coverage where necessary.

This refactoring leads to a cleaner architecture, making the `Scheduler`
a more focused component and centralizing initial admission control
logic while paving the way for the future Flow Controller.

This is aligned with the direction in `0683-epp-architecture-proposal`
and should be nearly no-op in terms of EPP behavior.
LukeAVanDrie added a commit to LukeAVanDrie/gateway-api-inference-extension that referenced this pull request Jun 3, 2025
Admission control/capacity management is now handled in
`requestcontrol.Director.PreDispatch` (and soon to be absorbed into the
new Flow Controller). This should no longer be a responsibility of the
scheduling framework and this check is already being applied in kubernetes-sigs#805
prior to the scheduling layer being invoked.

This is not a no-op change. Previously, the `SheddableCapacityFilter`,
in addition to dropping sheddable requests when at capacity, would also
strictly filter the pods that the rest of the scheduling plugins would
consider as input. This change removes that strict filtering so all pods
are now considered so long as the system is not considered saturated.
This means sheddable requests now follow the same scheduling path as
critical requests provided they are not dropped by the saturation
detection check in `PreDispatch`.
LukeAVanDrie added a commit to LukeAVanDrie/gateway-api-inference-extension that referenced this pull request Jun 4, 2025
Admission control/capacity management is now handled in
`requestcontrol.Director.PreDispatch` (and soon to be absorbed into the
new Flow Controller). This should no longer be a responsibility of the
scheduling framework and this check is already being applied in kubernetes-sigs#805
prior to the scheduling layer being invoked.

This is not a no-op change. Previously, the `SheddableCapacityFilter`,
in addition to dropping sheddable requests when at capacity, would also
strictly filter the pods that the rest of the scheduling plugins would
consider as input. This change removes that strict filtering so all pods
are now considered so long as the system is not considered saturated.
This means sheddable requests now follow the same scheduling path as
critical requests provided they are not dropped by the saturation
detection check in `PreDispatch`.
k8s-ci-robot pushed a commit that referenced this pull request Jun 4, 2025
Admission control/capacity management is now handled in
`requestcontrol.Director.PreDispatch` (and soon to be absorbed into the
new Flow Controller). This should no longer be a responsibility of the
scheduling framework and this check is already being applied in #805
prior to the scheduling layer being invoked.

This is not a no-op change. Previously, the `SheddableCapacityFilter`,
in addition to dropping sheddable requests when at capacity, would also
strictly filter the pods that the rest of the scheduling plugins would
consider as input. This change removes that strict filtering so all pods
are now considered so long as the system is not considered saturated.
This means sheddable requests now follow the same scheduling path as
critical requests provided they are not dropped by the saturation
detection check in `PreDispatch`.
shmuelk pushed a commit to shmuelk/gateway-api-inference-extension that referenced this pull request Jun 9, 2025
Admission control/capacity management is now handled in
`requestcontrol.Director.PreDispatch` (and soon to be absorbed into the
new Flow Controller). This should no longer be a responsibility of the
scheduling framework and this check is already being applied in kubernetes-sigs#805
prior to the scheduling layer being invoked.

This is not a no-op change. Previously, the `SheddableCapacityFilter`,
in addition to dropping sheddable requests when at capacity, would also
strictly filter the pods that the rest of the scheduling plugins would
consider as input. This change removes that strict filtering so all pods
are now considered so long as the system is not considered saturated.
This means sheddable requests now follow the same scheduling path as
critical requests provided they are not dropped by the saturation
detection check in `PreDispatch`.
rlakhtakia pushed a commit to rlakhtakia/gateway-api-inference-extension that referenced this pull request Jun 11, 2025
This commit refactors the request processing pipeline, externalizing
saturation detection and criticality-based service differentiation
from the Scheduler. These responsibilities are now primarily managed by
the RequestControl.Director.

This change is a preparatory step for the introduction of a new
Flow Controller component, which will eventually absorb these admission
control duties.

Key changes include:

- Introduced `PreDispatch` method to `RequestControl.Director` It
  utilizes the `SaturationDetector` for admission control of
  non-critical requests and handles request criticality to determine if
  saturation checks are bypassed.
- The saturation detection logic for dropping non-critical requests
  is intentionally preserved within the `Director` at this stage.
  This allows the option to bypass the future Flow Controller
  component during its maturation, ensuring the existing saturation
  and sheddable request behavior can be maintained as a fallback.
- Updated `main.go` to instantiate the `SaturationDetector`, wiring it
  into the request handling flow.
- Updated `director_test.go` to align with the new component
  responsibilities, adding additional coverage where necessary.

This refactoring leads to a cleaner architecture, making the `Scheduler`
a more focused component and centralizing initial admission control
logic while paving the way for the future Flow Controller.

This is aligned with the direction in `0683-epp-architecture-proposal`
and should be nearly no-op in terms of EPP behavior.
rlakhtakia pushed a commit to rlakhtakia/gateway-api-inference-extension that referenced this pull request Jun 11, 2025
Admission control/capacity management is now handled in
`requestcontrol.Director.PreDispatch` (and soon to be absorbed into the
new Flow Controller). This should no longer be a responsibility of the
scheduling framework and this check is already being applied in kubernetes-sigs#805
prior to the scheduling layer being invoked.

This is not a no-op change. Previously, the `SheddableCapacityFilter`,
in addition to dropping sheddable requests when at capacity, would also
strictly filter the pods that the rest of the scheduling plugins would
consider as input. This change removes that strict filtering so all pods
are now considered so long as the system is not considered saturated.
This means sheddable requests now follow the same scheduling path as
critical requests provided they are not dropped by the saturation
detection check in `PreDispatch`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants