-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance and update documentation #351
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -2,8 +2,25 @@ | |||||
|
||||||
Changes since `v0.1.0`: | ||||||
|
||||||
- Fixed bug in a BestEffortFIFO ClusterQueue where a workload might not be | ||||||
retried after a transient error. | ||||||
### Features | ||||||
- Bumped the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported and Queue is now named LocalQueue. | ||||||
- Add webhooks to validate and add defaults to all kueue APIs. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the webhooks validate and add defaults. How's the new wording? |
||||||
- Support [codependent resources](/docs/concepts/cluster_queue.md#codepedent-resources) | ||||||
by assigning the same flavor to codependent resources in a pod set. | ||||||
- Support [pod overhead](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/) | ||||||
in Workload pod sets. | ||||||
- Default requests to limits if requests are not set in a Workload pod set, to | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe Set is a more accurate verb:
wdyt? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed to set and added a link to k8s reference doc. |
||||||
match internal defaulting for k8s Pods. | ||||||
- Added [prometheus metrics](/docs/reference/metrics.md) to monitor health of | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||
the system and the status of ClusterQueues. | ||||||
|
||||||
### Bug fixes | ||||||
|
||||||
- Prevent Workloads that don't match the ClusterQueue's namespaceSelector from | ||||||
blocking other Workloads in a StrictFIFO ClusterQueue. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure if this is a feature or a bug fix. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||
- Fixed number of pending workloads in a BestEffortFIFO ClusterQueue. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||
- Fixed bug in a BestEffortFIFO ClusterQueue where a workload might not be | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||
retried after a transient error. | ||||||
- Fixed requeuing an out-of-date workload when failed to admit it. | ||||||
- Fixed bug in a BestEffortFIFO ClusterQueue where unadmissible workloads | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||
were not removed from the ClusterQueue when removing the corresponding Queue. |
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -5,21 +5,27 @@ Kueue is a set of APIs and controller for [job](docs/concepts/workload.md) | |||||||||||||||||||||
a job should be [admitted](docs/concepts#admission) to start (as in pods can be | ||||||||||||||||||||||
created) and when it should stop (as in active pods should be deleted). | ||||||||||||||||||||||
|
||||||||||||||||||||||
## Why use Kueue | ||||||||||||||||||||||
|
||||||||||||||||||||||
Kueue is a lean controller that you can install on top of a vanilla Kubernetes | ||||||||||||||||||||||
cluster without replacing any components. It is compatible with cloud | ||||||||||||||||||||||
environments where: | ||||||||||||||||||||||
- Nodes and other compute resources can be scaled up and down. | ||||||||||||||||||||||
- Compute resources are heterogeneous (in architecture, availability, price, etc.). | ||||||||||||||||||||||
Comment on lines
+10
to
+14
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does "heterogeneous" mean in this example? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. they have different characteristics There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it mean it is not compatible with environments where architecture is homogeneous? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is compatible, as homogeneous is just simpler. In kueue terms, if your resources are homegeneous, you need only one ResourceFlavor. |
||||||||||||||||||||||
|
||||||||||||||||||||||
Kueue APIs allow you to express: | ||||||||||||||||||||||
- Quotas and policies for fair sharing among tenants. | ||||||||||||||||||||||
- Resource fungibility: if a [resource flavor](docs/concepts/cluster_queue.md#resourceflavor-object) | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry for marking the changes on only one line 😞 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added the subject. It's kueue that runs (or rather admit) the job in a different flavor. I think we don't need the link to workload again. It's in the first paragraph. |
||||||||||||||||||||||
is fully utilized, run the [job](docs/concepts/workload.md) using a different | ||||||||||||||||||||||
flavor. | ||||||||||||||||||||||
|
||||||||||||||||||||||
The main design principle for Kueue is to avoid duplicating mature functionality | ||||||||||||||||||||||
in [Kubernetes components](https://kubernetes.io/docs/concepts/overview/components/) | ||||||||||||||||||||||
and well-established third-party controllers. Autoscaling, pod-to-node scheduling and | ||||||||||||||||||||||
job lifecycle management are the responsibility of cluster-autoscaler, | ||||||||||||||||||||||
kube-scheduler and kube-controller-manager, respectively. Advanced | ||||||||||||||||||||||
admission control can be delegated to controllers such as [gatekeeper](https://github.com/open-policy-agent/gatekeeper). | ||||||||||||||||||||||
|
||||||||||||||||||||||
<!-- TODO(#64) Remove links to google docs once the contents have been migrated to this repo --> | ||||||||||||||||||||||
Learn more by reading the design docs: | ||||||||||||||||||||||
- [bit.ly/kueue-apis](https://bit.ly/kueue-apis) (please join the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-batch) | ||||||||||||||||||||||
to get access) discusses the API proposal and a high-level description of how it | ||||||||||||||||||||||
operates. | ||||||||||||||||||||||
- [bit.ly/kueue-controller-design](https://bit.ly/kueue-controller-design) | ||||||||||||||||||||||
presents the detailed design of the controller. | ||||||||||||||||||||||
|
||||||||||||||||||||||
## Installation | ||||||||||||||||||||||
|
||||||||||||||||||||||
**Requires Kubernetes 1.22 or newer**. | ||||||||||||||||||||||
|
@@ -52,6 +58,18 @@ Learn more about: | |||||||||||||||||||||
- Kueue [concepts](docs/concepts). | ||||||||||||||||||||||
- Common and advanced [tasks](docs/tasks). | ||||||||||||||||||||||
|
||||||||||||||||||||||
## Architecture | ||||||||||||||||||||||
|
||||||||||||||||||||||
<!-- TODO(#64) Remove links to google docs once the contents have been migrated to this repo --> | ||||||||||||||||||||||
|
||||||||||||||||||||||
Learn more about the architecture of Kueue in the design docs: | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||||||||||||||||||
|
||||||||||||||||||||||
- [bit.ly/kueue-apis](https://bit.ly/kueue-apis) (please join the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-batch) | ||||||||||||||||||||||
to get access) discusses the API proposal and a high-level description of how it | ||||||||||||||||||||||
operates. | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would keep the |
||||||||||||||||||||||
- [bit.ly/kueue-controller-design](https://bit.ly/kueue-controller-design) | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||||||
presents the detailed design of the controller. | ||||||||||||||||||||||
|
||||||||||||||||||||||
## Community, discussion, contribution, and support | ||||||||||||||||||||||
|
||||||||||||||||||||||
Learn how to engage with the Kubernetes community on the [community page](http://kubernetes.io/community/). | ||||||||||||||||||||||
|
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -1,8 +1,9 @@ | ||||||||||||||||||||||||
# Cluster Queue | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
A `ClusterQueue` is a cluster-scoped object that governs a pool of resources | ||||||||||||||||||||||||
A ClusterQueue is a cluster-scoped object that governs a pool of resources | ||||||||||||||||||||||||
such as CPU, memory and hardware accelerators. A `ClusterQueue` defines: | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||||||||||||||||||||
- The resource _flavors_ that it manages, with usage limits and order of consumption. | ||||||||||||||||||||||||
- The [resource _flavors_](#resourceflavor-object) that it manages, with usage | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I'm assuming "it" refers to "the cluster" but please edit accordingly, but we must mention what "it" means. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the ClusterQueue. Done |
||||||||||||||||||||||||
limits and order of consumption. | ||||||||||||||||||||||||
- Fair sharing rules across the tenants of the cluster. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
Only [cluster administrators](/docs/tasks#batch-administrator) should create `ClusterQueue` objects. | ||||||||||||||||||||||||
|
@@ -35,6 +36,74 @@ This ClusterQueue admits [workloads](workload.md) if and only if: | |||||||||||||||||||||||
|
||||||||||||||||||||||||
You can specify the quota as a [quantity](https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/). | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
## Resources | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
In a ClusterQueue, you can define quotas for multiple [compute resources](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#resource-types) | ||||||||||||||||||||||||
(cpu, memory, GPUs, etc.). | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||||||||
|
||||||||||||||||||||||||
For each resource, you can define quotas for multiple _flavors_. A | ||||||||||||||||||||||||
flavor represents different variations of a resource. The variations can be | ||||||||||||||||||||||||
defined in a [ResourceFlavor object](#resourceflavor-object). | ||||||||||||||||||||||||
Comment on lines
+44
to
+46
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||||||||
|
||||||||||||||||||||||||
In a process called [admission](.#admission), Kueue assigns | ||||||||||||||||||||||||
[Workload pod sets](workload.md#pod-sets) a flavor for each resource it requests. | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Not sure if "it" refers to "Kueue" but we need to replace "it" with the object name. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the pod set. Done |
||||||||||||||||||||||||
Kueue assigns the first flavor in the ClusterQueue's `.spec.resources[*].flavors` | ||||||||||||||||||||||||
list that has enough unused `min` quota in the ClusterQueue or the | ||||||||||||||||||||||||
ClusterQueue's [cohort](#cohort). | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
### Codepedent resources | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
It is possible that multiple resources are tied to the same flavors. This is | ||||||||||||||||||||||||
typical for `cpu` and `memory`, where the flavors are generally tied to a | ||||||||||||||||||||||||
machine family or availability guarantees. | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||||||||
|
||||||||||||||||||||||||
If this is the case, the resources in the ClusterQueue must list the same | ||||||||||||||||||||||||
flavors in the same order. When two or more resources match their flavors, | ||||||||||||||||||||||||
they are said to be codependent. During admission, for each pod set in a | ||||||||||||||||||||||||
Workload, Kueue assigns the same flavor to the codependent resources that the | ||||||||||||||||||||||||
pod set requests. | ||||||||||||||||||||||||
Comment on lines
+60
to
+64
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||||||||||||||||||||
|
||||||||||||||||||||||||
An example of a ClusterQueue with codependent resources looks like the following: | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
```yaml | ||||||||||||||||||||||||
apiVersion: kueue.x-k8s.io/v1alpha1 | ||||||||||||||||||||||||
kind: ClusterQueue | ||||||||||||||||||||||||
metadata: | ||||||||||||||||||||||||
name: cluster-total | ||||||||||||||||||||||||
spec: | ||||||||||||||||||||||||
namespaceSelector: {} | ||||||||||||||||||||||||
resources: | ||||||||||||||||||||||||
- name: "cpu" | ||||||||||||||||||||||||
flavors: | ||||||||||||||||||||||||
- name: spot | ||||||||||||||||||||||||
quota: | ||||||||||||||||||||||||
min: 18 | ||||||||||||||||||||||||
- name: on_demand | ||||||||||||||||||||||||
quota: | ||||||||||||||||||||||||
min: 9 | ||||||||||||||||||||||||
- name: "memory" | ||||||||||||||||||||||||
flavors: | ||||||||||||||||||||||||
- name: spot | ||||||||||||||||||||||||
quota: | ||||||||||||||||||||||||
min: 72Gi | ||||||||||||||||||||||||
- name: on_demand | ||||||||||||||||||||||||
quota: | ||||||||||||||||||||||||
min: 36Gi | ||||||||||||||||||||||||
- name: "gpu" | ||||||||||||||||||||||||
flavors: | ||||||||||||||||||||||||
- name: vendor1 | ||||||||||||||||||||||||
quota: | ||||||||||||||||||||||||
min: 10 | ||||||||||||||||||||||||
- name: vendor2 | ||||||||||||||||||||||||
quota: | ||||||||||||||||||||||||
min: 10 | ||||||||||||||||||||||||
``` | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
In the example above, `cpu` and `memory` are codependent resources, while `gpu` | ||||||||||||||||||||||||
is independent. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
If two resources are not codependent, they must not have any flavors in common. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
## Namespace selector | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
You can limit which namespaces can have workloads admitted in the ClusterQueue | ||||||||||||||||||||||||
|
@@ -81,7 +150,7 @@ Resources in a cluster are typically not homogeneous. Resources could differ in: | |||||||||||||||||||||||
- architecture (ex: x86 vs ARM CPUs) | ||||||||||||||||||||||||
- brands and models (ex: Radeon 7000 vs Nvidia A100 vs T4 GPUs) | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
A `ResourceFlavor` is an object that represents these variations and allows you | ||||||||||||||||||||||||
A ResourceFlavor is an object that represents these variations and allows you | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||||||||||||||||||||
to associate them with node labels and taints. | ||||||||||||||||||||||||
Comment on lines
+153
to
154
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||||||||
|
||||||||||||||||||||||||
**Note**: If your cluster is homogeneous, you can use an [empty ResourceFlavor](#empty-resourceflavor) | ||||||||||||||||||||||||
|
@@ -102,13 +171,8 @@ taints: | |||||||||||||||||||||||
value: "true" | ||||||||||||||||||||||||
``` | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
You can use the `.metadata.name` to reference a flavor from a ClusterQueue in | ||||||||||||||||||||||||
the `.spec.resources[*].flavors[*].name` field. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
For each resource of each [pod set](workload.md#pod-sets) in a Workload, Kueue | ||||||||||||||||||||||||
assigns the first flavor in the `.spec.resources[*].flavors` | ||||||||||||||||||||||||
list that has enough unused quota in the ClusterQueue or the ClusterQueue's | ||||||||||||||||||||||||
[cohort](#cohort). | ||||||||||||||||||||||||
You can use the `.metadata.name` to reference a ResourceFlavor from a | ||||||||||||||||||||||||
ClusterQueue in the `.spec.resources[*].flavors[*].name` field. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
### ResourceFlavor labels | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
|
@@ -132,9 +196,9 @@ steps: | |||||||||||||||||||||||
didn't specify them already. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
For example, for a [batch/v1.Job](https://kubernetes.io/docs/concepts/workloads/controllers/job/), | ||||||||||||||||||||||||
Kueue adds the labels to `.spec.template.spec.nodeSelector`. This guarantees | ||||||||||||||||||||||||
that the workload Pods run on the nodes associated to the flavor that Kueue | ||||||||||||||||||||||||
decided that the workload should use. | ||||||||||||||||||||||||
Kueue adds the labels to the `.spec.template.spec.nodeSelector` field. This | ||||||||||||||||||||||||
guarantees that the workload Pods run on the nodes associated to the flavor | ||||||||||||||||||||||||
that Kueue decided that the workload should use. | ||||||||||||||||||||||||
Comment on lines
+199
to
+201
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nice, I was having a hard time with this one :) |
||||||||||||||||||||||||
|
||||||||||||||||||||||||
### ResourceFlavor taints | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
|
@@ -143,8 +207,9 @@ with taints. | |||||||||||||||||||||||
|
||||||||||||||||||||||||
Taints on the ResourceFlavor work similarly to [node taints](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/). | ||||||||||||||||||||||||
For Kueue to admit a workload to use the ResourceFlavor, the PodSpecs in the | ||||||||||||||||||||||||
workload should have a toleration for it. As opposed to ResourceFlavor labels, | ||||||||||||||||||||||||
Kueue will not add tolerations for the flavor taints. | ||||||||||||||||||||||||
workload should have a toleration for it. As opposed to the behavior for | ||||||||||||||||||||||||
[ResourceFlavor labels](#resourceflavor-labels), Kueue will not add tolerations | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||||||||||||||||||||
for the flavor taints. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
### Empty ResourceFlavor | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
|
@@ -173,18 +238,18 @@ ClusterQueue. | |||||||||||||||||||||||
|
||||||||||||||||||||||||
### Flavors and borrowing semantics | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
When borrowing, Kueue satisfies the following semantics: | ||||||||||||||||||||||||
When borrowing, Kueue satisfies the following admission semantics: | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
- When assigning flavors, Kueue goes through the list of flavors in | ||||||||||||||||||||||||
`.spec.resources[*].flavors`. For each flavor, Kueue attempts to | ||||||||||||||||||||||||
fit the workload using the min quota of the ClusterQueue or the unused | ||||||||||||||||||||||||
min quota of other ClusterQueues in the cohort, up to the max quota of the | ||||||||||||||||||||||||
ClusterQueue. If the workload doesn't fit, Kueue proceeds evaluating the next | ||||||||||||||||||||||||
- When assigning flavors, Kueue goes through the list of flavors in the | ||||||||||||||||||||||||
ClusterQueue's `.spec.resources[*].flavors`. For each flavor, Kueue attempts | ||||||||||||||||||||||||
to fit a Workload's pod set using the `min` quota of the ClusterQueue or the | ||||||||||||||||||||||||
unused `min` quota of other ClusterQueues in the cohort, up to the `max` quota | ||||||||||||||||||||||||
of the ClusterQueue. If the workload doesn't fit, Kueue proceeds evaluating the next | ||||||||||||||||||||||||
flavor in the list. | ||||||||||||||||||||||||
Comment on lines
+243
to
248
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I completely reworked this section |
||||||||||||||||||||||||
- Borrowing happens per-flavor. A ClusterQueue can only borrow quota of flavors | ||||||||||||||||||||||||
it defines. | ||||||||||||||||||||||||
- A ClusterQueue can only borrow quota of flavors it defines and it can only | ||||||||||||||||||||||||
borrow quota for one flavor. | ||||||||||||||||||||||||
Comment on lines
+249
to
+250
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wouldn't mention "and it can only borrow quota for one flavor" because that also applies in the no borrowing case as well. |
||||||||||||||||||||||||
|
||||||||||||||||||||||||
### Example | ||||||||||||||||||||||||
### Borrowing example | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
Assume you created the following two ClusterQueues: | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,17 @@ | ||||||
# Local Queue | ||||||
|
||||||
A `LocalQueue` is a namespaced object that groups closely related workloads | ||||||
belonging to a single tenant. A `LocalQueue` points to one [`ClusterQueue`](cluster_queue.md) | ||||||
from which resources are allocated to run its workloads. | ||||||
|
||||||
Users submit jobs to a `LocalQueue`, instead of directly to a `ClusterQueue`. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||
Tenants can discover which queues they can submit jobs to by listing the | ||||||
local queues in their namespace. The command looks similar to the following: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||
|
||||||
```sh | ||||||
kubectl get -n my-namespace localqueues | ||||||
# Alternatively, use the alias `queue` or `queues` | ||||||
kubectl get -n my-namespace queues | ||||||
``` | ||||||
|
||||||
`queue` and `queues` are aliases for `localqueue`. |
This file was deleted.
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -39,6 +39,20 @@ to scrape metrics from kueue components, run the following command: | |||||
kubectl apply -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/prometheus.yaml | ||||||
``` | ||||||
|
||||||
### Uninstall | ||||||
|
||||||
To uninstall a released version of Kueue from your cluster, run the following command: | ||||||
|
||||||
```shell | ||||||
VERSION=v0.1.1 | ||||||
kubectl delete -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/manifests.yaml | ||||||
``` | ||||||
|
||||||
### Upgrading from 0.1 to 0.2 | ||||||
|
||||||
Upgrading from `0.1.x` to `0.2.y` is not supported due to breaking API changes. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||
To install Kueue `0.2.y`, [uninstall](#uninstall) the older version first. | ||||||
|
||||||
## Install a custom-configured released version | ||||||
|
||||||
To install a custom-configured released version of Kueue in your cluster, execute the following steps: | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, except we don't do ticks for API objects in k8s.io