Skip to content

Add KEP for DRA: Extended Resource #5136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 54 commits into
base: master
Choose a base branch
from
Open

Conversation

yliaog
Copy link

@yliaog yliaog commented Feb 5, 2025

  • One-line PR description:
    Add new KEP for supporting extended resource requests in DRA

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Feb 5, 2025
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 5, 2025
@yliaog
Copy link
Author

yliaog commented Feb 5, 2025

/assign @johnbelamaric

Copy link
Member

@johnbelamaric johnbelamaric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks @yliaog

@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Feb 6, 2025
@pohly
Copy link
Contributor

pohly commented Feb 6, 2025

/cc

@yliaog yliaog force-pushed the master branch 11 times, most recently from 7ccd621 to a1d3c16 Compare February 6, 2025 23:30
@yliaog
Copy link
Author

yliaog commented May 26, 2025

ResourceClaims define how pods should be scheduled and are inputs to scheduling, so the scheduler should not be involved in creating them.

AFAIK the webhooks are designed for modifying pod definitions, so I don't understand why this alternative wasn't chosen. Such ResourceClaim would have clear lifetime and could be used for allocation as any other claim without building any exceptions.

I guess there will be similar problem when we start modeling mem and cpu, so having a new type of ResourceClaim that implements a semantic of optional allocation (when DRA resource is chosen) is something that is needed anyway.

/cc @sanposhiho @macsko @wojtek-t

ResourceClaim created in scheduler as proposed in this KEP is soly for recording the allocation results, such that kubelet can reference it during actuation.

Currently a Pod can reference a resourceclaim template, then a resourceclaim controller can create that resourceclaim. What is proposed in this KEP is similar to that flow, in that Pod has 'extendedresouce' requests in its Spec, a controller (in this case, the scheduler) creates the resourceclaim. The reason for using scheduler (instead of the resourceclaim controller) to create the claim is due to that only scheduler has all the information needed to create the claim's requests, specifically, at the dynamicresources plugin's Filter phase.

mem and cpu can be modeled the same way as proposed in this KEP for extended resources, are they are all of the same type (string -> an integer, example.com/gpu: 1, cpu: 1, mem: 1G)

webhook is not a good choice because
1/ that would require separate user configuration, installation, operation of the webbhook
2/ that cannot dynamically allocate DRA resources, or device plugin resources to the same extended resource requests

@dom4ha
Copy link
Member

dom4ha commented May 26, 2025

ResourceClaim created in scheduler as proposed in this KEP is soly for recording the allocation results, such that kubelet can reference it during actuation.

I understand that, but still, that does not sound right. ResourceClaims define the scheduling intent and hold the result of that intent. I think we still should be able to define the intent that says: "Allocate an extended resource backed by DRA plugin unless the one backed by device plugin was allocated". No exceptions would be needed other than optionally not allocating such claim if extended resource was allocated. So both DRA and extended resource scheduler plugins need to be aware of each other, which sounds reasonable considering the feature integrates both concepts.

Currently a Pod can reference a resourceclaim template, then a resourceclaim controller can create that resourceclaim. What is proposed in this KEP is similar to that flow, in that Pod has 'extendedresouce' requests in its Spec, a controller (in this case, the scheduler) creates the resourceclaim.

The resource claim controller seems more suitable for creating such ResourceClaim then, as it's responsible for preparing the "scheduling intent" based on pods definition. Creating objects in schedulers may unnecessarily complicates many things that are currently hard to predict, so it sounds like asking for troubles.

@yliaog
Copy link
Author

yliaog commented May 26, 2025

ResourceClaim created in scheduler as proposed in this KEP is soly for recording the allocation results, such that kubelet can reference it during actuation.

I understand that, but still, that does not sound right. ResourceClaims define the scheduling intent and hold the result of that intent. I think we still should be able to define the intent that says: "Allocate an extended resource backed by DRA plugin unless the one backed by device plugin was allocated". No exceptions would be needed other than optionally not allocating such claim if extended resource was allocated. So both DRA and extended resource scheduler plugins need to be aware of each other, which sounds reasonable considering the feature integrates both concepts.

The intent is given by extended resource as given in spec.resources.requests (e.g. example.com/gpu: 1), there is no need for creating a claim for expressing such intent. Scheduler can act on such intent (example.com/gpu: 1) to make the allocation decision, it could be the noderesources plugin that satisfy such intent (in that case, the resources are provided by node.status.capacity), or it could be the dynamicresource plugin that satisfy such intent (in that case, the resources are provided by DRA resource slices).

In short, the intent is clearly specified in spec.resources.requests, resourceclaim is not necessary for the purpose of specifying the intent. Instead, it is created for the purpose of recording the allocation result.

There is no extened resource plugin, instead, extended resources are handled by noderesources plugin, similar to cpu/mem resources.

Currently a Pod can reference a resourceclaim template, then a resourceclaim controller can create that resourceclaim. What is proposed in this KEP is similar to that flow, in that Pod has 'extendedresouce' requests in its Spec, a controller (in this case, the scheduler) creates the resourceclaim.

The resource claim controller seems more suitable for creating such ResourceClaim then, as it's responsible for preparing the "scheduling intent" based on pods definition. Creating objects in schedulers may unnecessarily complicates many things that are currently hard to predict, so it sounds like asking for troubles.

As mentioned above, there is no need for preparing the "scheduling intent", as the intent is already well specified by pod.spec.resources.requests (e.g. example.com/gpu: 1)

this is not the first time to create objects in scheduler, currently scheduler creates Binding.

Creating this resourceclaim won't create another 'scheduling inent', as it is not associated with any pod.spec. Hence there is no circular dependency (i.e. there is no such case that scheduler creates the claim, which is depended on by the scheduler to act on). It is logically very clear that the scheduler creates the claim, and kublet consumes this claim for actuation.

@dom4ha
Copy link
Member

dom4ha commented May 26, 2025

this is not the first time to create objects in scheduler, currently scheduler creates Binding.

I'm not aware of any object that scheduler creates. The way the scheduling is documented to the users is that it's a process of assigning Pods to Nodes and ResourceClaims to ResourceSlices, which requires updating of existing objects, but not creating them. There was a decision taken that the DRA allocation is performed by updating status of ResourceClaim objects, because it was assumed that such object must always exist. The alternative was to have separate objects that would be dedicated to hold allocations, but scheduler would have to create them, so there would be a problem of their garbage collection etc, so exactly what is proposed here.

Note also that there are external schedulers (e.g. Kueue or autoscaling) that may start using resource nomination concept to instruct scheduler how to schedule (bind) pods. This means that they would have to create missing ResourceClaim when needed and scheduler would have to garbage collect it when it changes their decision.

Since we have two design options and one of them is aligned with the decision that allocation is a part of preexisting ResourceClaim object, I don't see a reason why we'd like to chose the different option here.

@pohly
Copy link
Contributor

pohly commented May 27, 2025

This means that they would have to create missing ResourceClaim

Why? Either they use the extended resources API, then they don't create ResourceClaims, or they use ResourceClaims, then the scheduler needs to honor that decision and doesn't need to garbage collect.

I don't see a reason why we'd like to chose the different option here.

Because what you are proposing fails to satisfy one important motivation for this KEP, "Enable application developers and operators to transition to DRA gradually at their own pace." The intended usage is that admins convert nodes from device plugins to DRA gradually, instead of having to take down the entire cluster, convert to DRA, then start scheduling workloads again.

If the "ResourceClaim for extended resources" gets created in advance, the pod is locked to being scheduled to nodes which use DRA. The ResourceClaim controller would need to be aware of resource utilization (available resources, running pods) to make a smart decision upfront and then react to scheduling failures by revising that decision. That sounds very complex to me and something that is better handled during scheduling itself.

@pohly
Copy link
Contributor

pohly commented May 27, 2025

Creating objects in schedulers may unnecessarily complicates many things that are currently hard to predict, so it sounds like asking for troubles.

That's exactly the problem when creating the ResourceClaim in the kube-controller-manager, because there the controller has to make predictions. In the scheduler it's not a prediction, it's based on the analysis of the current state of the cluster at the time of scheduling.

Or did you mean "predict future changes around the DRA design"? I'm not worried about that, the design of this KEP seems consistent to me. The ResourceClaim has two purposes, user intent and communicating allocations to the kubelet and other components which need to track resource usage (including the scheduler itself). This KEP only uses the second half, but that seems fine. It is normal in Kubernetes that API objects are created automatically to enact some other user-facing API.

I understand that you are worried about the complexity that this adds to the scheduler, but IMHO that's still the best solution. The complexity doesn't go away by moving it somewhere else...

@dom4ha
Copy link
Member

dom4ha commented May 27, 2025

Because what you are proposing fails to satisfy one important motivation for this KEP, "Enable application developers and operators to transition to DRA gradually at their own pace." The intended usage is that admins convert nodes from device plugins to DRA gradually, instead of having to take down the entire cluster, convert to DRA, then start scheduling workloads again.

It still satisfies, there are two equivalent approaches we should be choosing from:

  1. Create ResourceClaim in advance and annotate it "allocate unless the built-in or extended resources satisfy it"
  2. Request to allocate standard resources and create ResourceClaim if the built-in or extended resources does not satisfy it

The second option requires adding and removing the object whenever the scheduling decision changes. By scheduling decision I mean reserving resources that is reflected in api-server by placing ReservedFor or NominatedNodeName in allocatable object. Resource nomination concepts are not used extensively yet, but will be used more and more, so operating on preexisting object (even if it's sometimes not used because built-in resources satisfy it) would be much simpler.

FYI @x13n @mwielgus

@wojtek-t
Copy link
Member

Jumping late to the the discussion - I tend to agree with @dom4ha but really only the last comment is explaining the real motivation.

I think that GC and overall lifecycle is not a compelling reason - as @pohly wrote, moving the logic from one place to the other doesn't necessary reduce the complexity. (And it should be possible to use ownerRef to make the lifecycle management not very hard).

But I think the nomination concept that can be used across to communicate between different schedulers and if changing decisions requires creation/deletion of additional objects and we may in fact be going back-and-forth if the decision changes, this doesn't sound very compelling (xref: #5287 )

I think the pattern that Dominik proposed might not have been clear from the beginning. IIUC, what he is proposing that there is no need for prediction of whether the node with dra-driver or device-plugin will be chosen during RC creation. Instead of that, the proposal is to introduce additional bit of information in the ResourceClaim - let's temporarily call it may-be-satisfied-by-extended-resource: true (defaults to false).
If set, then scheduler internally will figure out if it should satisfy this claim or instead use extended-resource - depending on what a given node currently supports. And will reflect that appropriate in the status - so we probably need another satisfied-by-extended-resource: true field in the status that it can then set.

I agree this approach is not perfect either - but I would like to understand better what are the drawbacks of it if we don't want to proceed with it.

@pohly
Copy link
Contributor

pohly commented May 27, 2025

I agree this approach is not perfect either - but I would like to understand better what are the drawbacks of it if we don't want to proceed with it.

My main concern is that it depends on extending the ResourceClaim API in non-trivial ways. This new API will be visible to the user, which then raises the question how they should or shouldn't be allowed to use it.

My other concern is that the controller has two choices:

  • Always create such a ResourceClaim for all pods using extended resources, unconditionally. That's consistent with may-be-satisfied-by-extended-resource: true because it is uncertain what the cluster configuration will be at the time when the pod gets scheduled. This is clearly not desirable in a cluster where nor DRA driver is installed.
  • Create them conditionally based on DeviceClass settings and reconcile because DeviceClasses can come and go. This adds more complexity and potential races.

Either way, the scheduler still has to check for "do I need a ResourceClaim for this extended resource" and potentially wait, otherwise scheduling races with the ResourceClaim creation.

This seems like a lot of additional effort and complexity to avoid a Create call in the scheduler in a place where it currently already does an UpdateStatus. This simply doesn't seem worth it to me.

@dom4ha
Copy link
Member

dom4ha commented May 27, 2025

Either way, the scheduler still has to check for "do I need a ResourceClaim for this extended resource" and potentially wait, otherwise scheduling races with the ResourceClaim creation.

Both options can be combined together. Controller creates ResourceClaim with may-be-satisfied-by-extended-resource: true if there is a DeviceClass with matching extendedResourceName. Scheduler just need to check whether the DeviceClass haven't disappeared in the meantime.

@pohly
Copy link
Contributor

pohly commented May 27, 2025

@dom4ha: that is my second option ("Create them conditionally based on DeviceClass settings...").

Scheduler just need to check whether the DeviceClass haven't disappeared in the meantime.

"Just" downplays the complexity involved in this check. How does the scheduler know based on which DeviceClasses the ResourceClaim was created? How can it be sure that the DeviceClass(es) haven't been replaced or modified since then, if that influences the existence or content of the ResourceClaim?

All that the scheduler gets out of this is that it doesn't need to create the ResourceClaim, which might not even be needed.

@dom4ha
Copy link
Member

dom4ha commented May 27, 2025

"Just" downplays the complexity involved in this check. How does the scheduler know based on which DeviceClasses the ResourceClaim was created? How can it be sure that the DeviceClass(es) haven't been replaced or modified since then, if that influences the existence or content of the ResourceClaim?

Isn't the DeviceClass name a part of such a special ResourceClaim? Since it is, the DeviceClass current state should determine which extended resource (or other built-in resource) allocation can alternatively satisfy the claim (leave it unassigned). It's expected the extended resource or builtin resource is really specified by the pod, but if there's any mismatch, such a ResourceClaim can be ignored and only the extended resource from the PodSpec can be allocated.

All that the scheduler gets out of this is that it doesn't need to create the ResourceClaim, which might not even be needed.

Yes, these two approaches should be more or less equivalent, but IMO they make a difference for scheduling.

@dom4ha
Copy link
Member

dom4ha commented May 27, 2025

I agree this approach is not perfect either - but I would like to understand better what are the drawbacks of it if we don't want to proceed with it.

It's hard to come up with a perfect solution if we need to mix two different concepts together. Always one of them will be counterintuitive:

  • optionally allocated ResourceClaim vs
  • ResourceClaim appearing as side effect of scheduling.

As the KEP says, the solution may stay with us for longer, so we cannot expect we'd be able to clean it up soon. I also would like to understand drawbacks of the two options before we can take a decision.

@pohly
Copy link
Contributor

pohly commented May 27, 2025

Isn't the DeviceClass name a part of such a special ResourceClaim?

Not as describe in this KEP. The KEP describes requests, but not how they relate to DeviceClasses because it doesn't matter. Adding this mapping and perhaps other fields like UID and Generation of the referenced DeviceClasses would have to be added. This brings me back to "This new API will be visible to the user, which then raises the question how they should or shouldn't be allowed to use it.".

@dom4ha
Copy link
Member

dom4ha commented May 27, 2025

Isn't the DeviceClass name a part of such a special ResourceClaim?

Not as describe in this KEP. The KEP describes requests, but not how they relate to DeviceClasses because it doesn't matter. Adding this mapping and perhaps other fields like UID and Generation of the referenced DeviceClasses would have to be added. This brings me back to "This new API will be visible to the user, which then raises the question how they should or shouldn't be allowed to use it.".

Depending how you define "not visible to the user". It seems to me that both are not visible in a sense that user does not need to do anything to start using DRA but

  1. sometime a ResourceClaim is created without a spec but with allocation
  2. a ResourceClaim is always created, specifies what it needs and is scheduled as if it was a ResourceClaim, but sometimes it may be not allocated (since the regular resource was allocated instead)

I think that the second option is more verbose to those users who try to debug what happened during scheduling.

@pohly
Copy link
Contributor

pohly commented May 27, 2025

sometime a ResourceClaim is created without a spec but with allocation

Not quite. The special ResourceClaim in this KEP is a fully-formed, valid ResourceClaim including a spec.

That aside, my concern is that if the new fields get added to the ResourceClaimSpec, users may be tempted to set them when using DRA "normally". Do you envision them in the spec or in the status?

I think that the second option is more verbose to those users who try to debug what happened during scheduling.

With the current proposal, they get that from the pod status. I don't see how this "optionally allocated ResourceClaim" improves upon that.

@yliaog
Copy link
Author

yliaog commented May 27, 2025

Isn't the DeviceClass name a part of such a special ResourceClaim?

Not as describe in this KEP. The KEP describes requests, but not how they relate to DeviceClasses because it doesn't matter. Adding this mapping and perhaps other fields like UID and Generation of the referenced DeviceClasses would have to be added. This brings me back to "This new API will be visible to the user, which then raises the question how they should or shouldn't be allowed to use it.".

Depending how you define "not visible to the user". It seems to me that both are not visible in a sense that user does not need to do anything to start using DRA but

  1. sometime a ResourceClaim is created without a spec but with allocation
  2. a ResourceClaim is always created, specifies what it needs and is scheduled as if it was a ResourceClaim, but sometimes it may be not allocated (since the regular resource was allocated instead)

I think that the second option is more verbose to those users who try to debug what happened during scheduling.

As discussed earlier, this KEP does not need to use the claim to specify intent, as the intent is already specified by the pod.spec.resources.requests (e.g. example.com/gpu: 1). the claim is created to hold the allocation results that is then consumed by the kubelet.

The proposed alternative seems to me (IIUC) the claim is created to specify the intent, in that case, there are two intents, one is specified in the resourceclaim, the other is given in the pod.spec.resources.requests, then the schduler has to somehow understand these two intents actaully mean the same one, which is not needed when using the proposal in this KEP.

The other key difference between the two approaches is when the claim is created. This KEP proposes Just-in-time creation when it is absoutely needed, no more, no less. The alternative proposes creating the claim based on some static analysis of the cluster state (pod, device class, maybe node also, etc). As the scheduler has the most, best information (which it uses to make the scheduling/allocation decision), this lazy claim creation, pushing it to the scheduler, is better (IMO) than shifting it to earlier, which may not be necessary.

scheduler does create binding objects today, https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/binding-v1/

It is also a common pattern to create some objects (e..g replicaset) automaticaly by some controllers (deployment controller)

with all that said, IIUC, the alternative (resource nomination) is also in active discussion, IMO, we should wait for it to finalize, and getting more mature. We could come back to reevaluate it before this KEP goes to Beta, at that time, I hope we have more information on both (resource nomination, and DRA extended resource), to make more informed decision.

@pohly
Copy link
Contributor

pohly commented May 28, 2025

with all that said, IIUC, the alternative (resource nomination) is also in active discussion, IMO, we should wait for it to finalize, and getting more mature. We could come back to reevaluate it before this KEP goes to Beta

I second that. The required API changes in the current KEP revision (pod status to record the final mapping of extended resources to DRA devices) are very likely to be also needed when moving the creation of the ResourceClaim, so we are not on a wrong path. Moving the creation may need additional API changes, but we can discuss those when needed.

@wojtek-t
Copy link
Member

scheduler does create binding objects today, https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/binding-v1/

let's not mix the binding here - binding isn't really an object itself - we don't store any bindings in etcd or anything like that. What it translates to is setting the nodeName in a pod. That is not a good argument to use.

It is also a common pattern to create some objects (e..g replicaset) automaticaly by some controllers (deployment controller)

Right, but there is the necessary synchronization between different calls from a given component is something we should keep in mind. Which may visibly affect performance.
If we proceed with the proposed approach, then #5249 seems to be on the dependency path for Beta of it.

I second that. The required API changes in the current KEP revision (pod status to record the final mapping of extended resources to DRA devices) are very likely to be also needed when moving the creation of the ResourceClaim, so we are not on a wrong path. Moving the creation may need additional API changes, but we can discuss those when needed.

I guess I personally buy this. I agree that no matter who creates that claim it will have to be recorded in the pod status and existing API doesn't allow for that (in theory we could relax that that statuses may contain something from outside of the spec, but that doesn't sound like the best option). So we need a new field to reflect that in the status - which is part of this proposal.
For the sake of making progress, I think that this approach works for Alpha - but I would like to have a deeper discussion before going to beta - so basically adding an explicit beta graduation criteria to revisit the decision how the resource claim is created.

But I'm not a decision maker here - so will see what @dom4ha will say.

@pohly
Copy link
Contributor

pohly commented May 28, 2025

If we proceed with the proposed approach, then #5249 seems to be on the dependency path for Beta of it.

I'm not so sure about that: the proposed Create call happens in PreBind and thus is already decoupled from the sequential pod scheduling loop. I think #5249 is about API calls in that loop.

@wojtek-t
Copy link
Member

I'm not so sure about that: the proposed Create call happens in PreBind and thus is already decoupled from the sequential pod scheduling loop. I think #5249 is about API calls in that loop.

We want to unify the calls - and this KEP may actually introduce the dependency between different calls. May not be needed for this KEP, but may have consequences on how/what can be done there.

@dom4ha
Copy link
Member

dom4ha commented May 28, 2025

If we proceed with the proposed approach, then #5249 seems to be on the dependency path for Beta of it.

I'm not so sure about that: the proposed Create call happens in PreBind and thus is already decoupled from the sequential pod scheduling loop. I think #5249 is about API calls in that loop.

It's more than that, as all api calls will be abstracted to some operation object and sent to update queue first. Creating a ResourceClaim object itself instead of updating it, does not break any assumptions here, although there will be two code changes that need to be merged in non-obvious fashion.

The difference between object object creation and update has some subtle differences even for in-memory representation updates (in Reserve). It does not matter now, but will matter in the future once we develop combinatory algorithms which allocate and deallocate DRA resources all the time (insert vs update operation). But both options are doable, so I can't say that there are strong arguments for one option vs the other.

I'm rather trying to discuss what is better in terms of more straightforward approach with less hidden logic. In my mind the main problem the KEP needs to address is a case when 100% nodes are DRA (the ResourceClaim will be always created), but users keep using simpler extended resources semantic for various reasons (maybe forever). In such case, we don't have to optimize for the case where only half nodes are DRA, but how to translate extended resource into DRA ResourceClaim (maybe at some point scheduler won't have fit plugin anymore). I suspect we will work in the future on making the translation logic configurable and mutable over time. Then the question will be whether a translation logic should be determined (capture into RC) at workload creation time and used consistently for the whole workload lifetime, or reassessed on each rescheduling.

Yes, we are still far from implementing workload-awareness in scheduler and pod rescheduling, so it's hard to use it as argument, and I don't mind leaving the proposed approach for alpha in such case, unless @sanposhiho or @macsko or others have other strong arguments in this discussion.

@pohly
Copy link
Contributor

pohly commented May 28, 2025

It's more than that, as all api calls will be abstracted to some operation object and sent to update queue first. [...]

Thanks for the clarification.

In my mind the main problem the KEP needs to address is a case when 100% nodes are DRA

I noticed that the KEP currently only mentions "Enable cluster administrators to transition to DRA gradually at their own pace, possibly one node a time." under motivation. This a real problem in practice when you consider large clusters which need to do a live migration.

@yliaog: perhaps make this more obvious by adding "Efficiently support mixed clusters where some nodes use device plugins and some nodes use DRA drivers for the same hardware." as goal?

@dom4ha
Copy link
Member

dom4ha commented May 28, 2025

I noticed that the KEP currently only mentions "Enable cluster administrators to transition to DRA gradually at their own pace, possibly one node a time." under motivation. This a real problem in practice when you consider large clusters which need to do a live migration.

It's clearly one of the goals (it's listed one of the three motivations) and I have never suggested to not address it. However in my mind the cluster migration takes days/weeks, while transforming simplified spec to ResourceClaims will be needed for months/years, so I'm asking which of the two use cases should be driving the design options (assuming they should be equivalent and differ mostly in the special ResourceClaim visibility in the migration scenario only).

So, once a cluster administrator migrated all the nodes to DRA, shall the scheduler still attempt to allocate extended resource and even still run the relevant plugin? In the alternative option, the generated ResourceClaim would no longer need to have the annotation asking for it and we'd have a regular ResourceClaim (constructed based on some potentially admin configurable transformation logic ExtendedResource -> ResourceClaim).

If you think that the migration period use case has the priority (over the user specs migration) and it justifies the cost of introducing dynamically created ResourceClaim, I accept that as I still may be missing good understanding of the priorities.

…e claim, and clarified support for mixed device plugin & DRA
@yliaog
Copy link
Author

yliaog commented May 28, 2025

The following commit added critera for graduation to beta, and also clarified the support for mixed nodes.
ce117bc

@yliaog
Copy link
Author

yliaog commented May 29, 2025

@sanposhiho or @macsko do you have any concern on this KEP?

@yliaog
Copy link
Author

yliaog commented Jun 1, 2025

@sanposhiho or @macsko @dom4ha friendly ping ...

@yliaog
Copy link
Author

yliaog commented Jun 2, 2025

@mrunalp could you please take a look at this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/ui Categorizes an issue or PR as relevant to SIG UI. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Status: API review completed, 1.34
Status: In Progress
Status: Changes Requested
Status: In Progress
Status: Needs Review
Development

Successfully merging this pull request may close these issues.