Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance and update documentation #351

Merged
merged 1 commit into from
Aug 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 19 additions & 2 deletions CHANGELOG/CHANGELOG-0.2.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,25 @@

Changes since `v0.1.0`:

- Fixed bug in a BestEffortFIFO ClusterQueue where a workload might not be
retried after a transient error.
### Features
- Bumped the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported and Queue is now named LocalQueue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Bumped the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported and Queue is now named LocalQueue.
- Upgrade the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported.
- Rename `Queue` to `LocalQueue`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, except we don't do ticks for API objects in k8s.io

- Add webhooks to validate and add defaults to all kueue APIs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Add webhooks to validate and add defaults to all kueue APIs.
- Add webhooks to validate, and add defaults to all kueue APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the webhooks validate and add defaults.

How's the new wording?

- Support [codependent resources](/docs/concepts/cluster_queue.md#codepedent-resources)
by assigning the same flavor to codependent resources in a pod set.
- Support [pod overhead](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/)
in Workload pod sets.
- Default requests to limits if requests are not set in a Workload pod set, to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Default requests to limits if requests are not set in a Workload pod set, to
- Define default requests to limits if requests are not set in a Workload pod set, to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Set is a more accurate verb:

Set requests to limits

wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to set and added a link to k8s reference doc.

match internal defaulting for k8s Pods.
- Added [prometheus metrics](/docs/reference/metrics.md) to monitor health of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Added [prometheus metrics](/docs/reference/metrics.md) to monitor health of
- Add [prometheus metrics](/docs/reference/metrics.md) to monitor health of

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

the system and the status of ClusterQueues.

### Bug fixes

- Prevent Workloads that don't match the ClusterQueue's namespaceSelector from
blocking other Workloads in a StrictFIFO ClusterQueue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is a feature or a bug fix.
If it's a bug fix, it would be good to reword this in the form of:
"Fixed bug that prevented workloads..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- Fixed number of pending workloads in a BestEffortFIFO ClusterQueue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Fixed number of pending workloads in a BestEffortFIFO ClusterQueue.
- Fixed the number of pending workloads in a BestEffortFIFO ClusterQueue status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- Fixed bug in a BestEffortFIFO ClusterQueue where a workload might not be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Fixed bug in a BestEffortFIFO ClusterQueue where a workload might not be
- Fixed a bug in BestEffortFIFO ClusterQueue where a workload might not be

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

retried after a transient error.
- Fixed requeuing an out-of-date workload when failed to admit it.
- Fixed bug in a BestEffortFIFO ClusterQueue where unadmissible workloads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Fixed bug in a BestEffortFIFO ClusterQueue where unadmissible workloads
- Fixed a bug in BestEffortFIFO ClusterQueue where inadmissible workloads

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

were not removed from the ClusterQueue when removing the corresponding Queue.
34 changes: 26 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,27 @@ Kueue is a set of APIs and controller for [job](docs/concepts/workload.md)
a job should be [admitted](docs/concepts#admission) to start (as in pods can be
created) and when it should stop (as in active pods should be deleted).

## Why use Kueue

Kueue is a lean controller that you can install on top of a vanilla Kubernetes
cluster without replacing any components. It is compatible with cloud
environments where:
- Nodes and other compute resources can be scaled up and down.
- Compute resources are heterogeneous (in architecture, availability, price, etc.).
Comment on lines +10 to +14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Kueue is a lean controller that you can install on top of a vanilla Kubernetes
cluster without replacing any components. It is compatible with cloud
environments where:
- Nodes and other compute resources can be scaled up and down.
- Compute resources are heterogeneous (in architecture, availability, price, etc.).
Kueue is a lean controller that you can install on top of a vanilla Kubernetes
cluster. It does not replace any existing Kubernetes components. It is compatible with cloud
environments where:
- Compute resources are elastic and can be scaled up and down.
- Compute resources are heterogeneous (in architecture, availability, price, etc.).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "heterogeneous" mean in this example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they have different characteristics

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean it is not compatible with environments where architecture is homogeneous?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is compatible, as homogeneous is just simpler. In kueue terms, if your resources are homegeneous, you need only one ResourceFlavor.


Kueue APIs allow you to express:
- Quotas and policies for fair sharing among tenants.
- Resource fungibility: if a [resource flavor](docs/concepts/cluster_queue.md#resourceflavor-object)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Resource fungibility: if a [resource flavor](docs/concepts/cluster_queue.md#resourceflavor-object)
- Resource fungibility: if a [resource flavor](docs/concepts/cluster_queue.md#resourceflavor-object) is fully utilized, you can using a different flavor. For more information, see the [Workload objects](docs/concepts/workload.md).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for marking the changes on only one line 😞

Copy link
Contributor Author

@alculquicondor alculquicondor Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the subject. It's kueue that runs (or rather admit) the job in a different flavor. I think we don't need the link to workload again. It's in the first paragraph.

is fully utilized, run the [job](docs/concepts/workload.md) using a different
flavor.

The main design principle for Kueue is to avoid duplicating mature functionality
in [Kubernetes components](https://kubernetes.io/docs/concepts/overview/components/)
and well-established third-party controllers. Autoscaling, pod-to-node scheduling and
job lifecycle management are the responsibility of cluster-autoscaler,
kube-scheduler and kube-controller-manager, respectively. Advanced
admission control can be delegated to controllers such as [gatekeeper](https://github.com/open-policy-agent/gatekeeper).

<!-- TODO(#64) Remove links to google docs once the contents have been migrated to this repo -->
Learn more by reading the design docs:
- [bit.ly/kueue-apis](https://bit.ly/kueue-apis) (please join the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-batch)
to get access) discusses the API proposal and a high-level description of how it
operates.
- [bit.ly/kueue-controller-design](https://bit.ly/kueue-controller-design)
presents the detailed design of the controller.

## Installation

**Requires Kubernetes 1.22 or newer**.
Expand Down Expand Up @@ -52,6 +58,18 @@ Learn more about:
- Kueue [concepts](docs/concepts).
- Common and advanced [tasks](docs/tasks).

## Architecture

<!-- TODO(#64) Remove links to google docs once the contents have been migrated to this repo -->

Learn more about the architecture of Kueue in the design docs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Learn more about the architecture of Kueue in the design docs:
Learn more about the architecture of Kueue with the following design docs:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


- [bit.ly/kueue-apis](https://bit.ly/kueue-apis) (please join the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-batch)
to get access) discusses the API proposal and a high-level description of how it
operates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
operates.
- [kueue-apis](https://bit.ly/kueue-apis) discusses the API proposal and a high-level description of how it
operates. Join the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-batch)
to get document access.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep the bit.ly/ prefix, sometimes people highlight and want to copy the short link

- [bit.ly/kueue-controller-design](https://bit.ly/kueue-controller-design)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [bit.ly/kueue-controller-design](https://bit.ly/kueue-controller-design)
- [kueue-controller-design](https://bit.ly/kueue-controller-design)

presents the detailed design of the controller.

## Community, discussion, contribution, and support

Learn how to engage with the Kubernetes community on the [community page](http://kubernetes.io/community/).
Expand Down
11 changes: 6 additions & 5 deletions docs/concepts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ abstractions that Kueue uses to represent your cluster and workloads.
A cluster-scoped resource that governs a pool of resources, defining usage
limits and fair sharing rules.

### [Queue](queue.md)
### [Local Queue](local_queue.md)

A namespaced resource that groups closely related workloads belonging to a
single tenant.
Expand All @@ -30,11 +30,12 @@ models, etc.

### Admission

The process of admitting a workload to start (pods to be created). A workload
The process of admitting a Workload to start (pods to be created). A Workload
is admitted by a ClusterQueue according to the available resources and gets
resource flavors assigned for each requested resource. Sometimes referred to
as _workload scheduling_ or _job scheduling_ (not to be confused with
[pod scheduling](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/)).
resource flavors assigned for each requested resource.

Sometimes referred to as _workload scheduling_ or _job scheduling_
(not to be confused with [pod scheduling](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/)).

### [Cohort](cluster_queue.md#cohort)

Expand Down
113 changes: 89 additions & 24 deletions docs/concepts/cluster_queue.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# Cluster Queue

A `ClusterQueue` is a cluster-scoped object that governs a pool of resources
A ClusterQueue is a cluster-scoped object that governs a pool of resources
such as CPU, memory and hardware accelerators. A `ClusterQueue` defines:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
such as CPU, memory and hardware accelerators. A `ClusterQueue` defines:
such as CPU, memory, and hardware accelerators. A `ClusterQueue` defines:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- The resource _flavors_ that it manages, with usage limits and order of consumption.
- The [resource _flavors_](#resourceflavor-object) that it manages, with usage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The [resource _flavors_](#resourceflavor-object) that it manages, with usage
- The [resource _flavors_](#resourceflavor-object) that the cluster manages, with usage

I'm assuming "it" refers to "the cluster" but please edit accordingly, but we must mention what "it" means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ClusterQueue. Done

limits and order of consumption.
- Fair sharing rules across the tenants of the cluster.

Only [cluster administrators](/docs/tasks#batch-administrator) should create `ClusterQueue` objects.
Expand Down Expand Up @@ -35,6 +36,74 @@ This ClusterQueue admits [workloads](workload.md) if and only if:

You can specify the quota as a [quantity](https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/).

## Resources

In a ClusterQueue, you can define quotas for multiple [compute resources](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#resource-types)
(cpu, memory, GPUs, etc.).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(cpu, memory, GPUs, etc.).
(CPU, memory, GPUs, etc.).


For each resource, you can define quotas for multiple _flavors_. A
flavor represents different variations of a resource. The variations can be
defined in a [ResourceFlavor object](#resourceflavor-object).
Comment on lines +44 to +46
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For each resource, you can define quotas for multiple _flavors_. A
flavor represents different variations of a resource. The variations can be
defined in a [ResourceFlavor object](#resourceflavor-object).
For each resource, you can define quotas for multiple _flavors_. Flavors represent different variations of a resource (for example, different GPU models). A flavor can be defined using a [ResourceFlavor object](#resourceflavor-object).


In a process called [admission](.#admission), Kueue assigns
[Workload pod sets](workload.md#pod-sets) a flavor for each resource it requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[Workload pod sets](workload.md#pod-sets) a flavor for each resource it requests.
to the [Workload pod sets](workload.md#pod-sets) a flavor for each resource Kueue requests.

Not sure if "it" refers to "Kueue" but we need to replace "it" with the object name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pod set. Done

Kueue assigns the first flavor in the ClusterQueue's `.spec.resources[*].flavors`
list that has enough unused `min` quota in the ClusterQueue or the
ClusterQueue's [cohort](#cohort).

### Codepedent resources

It is possible that multiple resources are tied to the same flavors. This is
typical for `cpu` and `memory`, where the flavors are generally tied to a
machine family or availability guarantees.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
machine family or availability guarantees.
machine family or availability policies. When two or more resources match their flavors,
they are said to be codependent resources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
machine family or availability guarantees.
machine family or provisioning model (e.g., spot vs standard).


If this is the case, the resources in the ClusterQueue must list the same
flavors in the same order. When two or more resources match their flavors,
they are said to be codependent. During admission, for each pod set in a
Workload, Kueue assigns the same flavor to the codependent resources that the
pod set requests.
Comment on lines +60 to +64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If this is the case, the resources in the ClusterQueue must list the same
flavors in the same order. When two or more resources match their flavors,
they are said to be codependent. During admission, for each pod set in a
Workload, Kueue assigns the same flavor to the codependent resources that the
pod set requests.
To manage codependent resources, list the resources in the ClusterQueue in the same
flavors order. During admission, for each pod set in a
Workload, Kueue assigns the same flavor to the codependent resources that the
pod set requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


An example of a ClusterQueue with codependent resources looks like the following:

```yaml
apiVersion: kueue.x-k8s.io/v1alpha1
kind: ClusterQueue
metadata:
name: cluster-total
spec:
namespaceSelector: {}
resources:
- name: "cpu"
flavors:
- name: spot
quota:
min: 18
- name: on_demand
quota:
min: 9
- name: "memory"
flavors:
- name: spot
quota:
min: 72Gi
- name: on_demand
quota:
min: 36Gi
- name: "gpu"
flavors:
- name: vendor1
quota:
min: 10
- name: vendor2
quota:
min: 10
```

In the example above, `cpu` and `memory` are codependent resources, while `gpu`
is independent.

If two resources are not codependent, they must not have any flavors in common.

## Namespace selector

You can limit which namespaces can have workloads admitted in the ClusterQueue
Expand Down Expand Up @@ -81,7 +150,7 @@ Resources in a cluster are typically not homogeneous. Resources could differ in:
- architecture (ex: x86 vs ARM CPUs)
- brands and models (ex: Radeon 7000 vs Nvidia A100 vs T4 GPUs)

A `ResourceFlavor` is an object that represents these variations and allows you
A ResourceFlavor is an object that represents these variations and allows you
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A ResourceFlavor is an object that represents these variations and allows you
A ResourceFlavor is an object that represents these resource variations and allows you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

to associate them with node labels and taints.
Comment on lines +153 to 154
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A ResourceFlavor is an object that represents these variations and allows you
to associate them with node labels and taints.
A ResourceFlavor is an object that represents these variations and allows you
to associate them with the labels and taints of the nodes that offer those resource flavors.


**Note**: If your cluster is homogeneous, you can use an [empty ResourceFlavor](#empty-resourceflavor)
Expand All @@ -102,13 +171,8 @@ taints:
value: "true"
```

You can use the `.metadata.name` to reference a flavor from a ClusterQueue in
the `.spec.resources[*].flavors[*].name` field.

For each resource of each [pod set](workload.md#pod-sets) in a Workload, Kueue
assigns the first flavor in the `.spec.resources[*].flavors`
list that has enough unused quota in the ClusterQueue or the ClusterQueue's
[cohort](#cohort).
You can use the `.metadata.name` to reference a ResourceFlavor from a
ClusterQueue in the `.spec.resources[*].flavors[*].name` field.

### ResourceFlavor labels

Expand All @@ -132,9 +196,9 @@ steps:
didn't specify them already.

For example, for a [batch/v1.Job](https://kubernetes.io/docs/concepts/workloads/controllers/job/),
Kueue adds the labels to `.spec.template.spec.nodeSelector`. This guarantees
that the workload Pods run on the nodes associated to the flavor that Kueue
decided that the workload should use.
Kueue adds the labels to the `.spec.template.spec.nodeSelector` field. This
guarantees that the workload Pods run on the nodes associated to the flavor
that Kueue decided that the workload should use.
Comment on lines +199 to +201
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Kueue adds the labels to the `.spec.template.spec.nodeSelector` field. This
guarantees that the workload Pods run on the nodes associated to the flavor
that Kueue decided that the workload should use.
Kueue adds the labels to the `.spec.template.spec.nodeSelector` field. This
guarantees that the workload Pods can only be scheduled on the nodes targeted by the flavor
that Kueue assigned to the workload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, I was having a hard time with this one :)


### ResourceFlavor taints

Expand All @@ -143,8 +207,9 @@ with taints.

Taints on the ResourceFlavor work similarly to [node taints](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/).
For Kueue to admit a workload to use the ResourceFlavor, the PodSpecs in the
workload should have a toleration for it. As opposed to ResourceFlavor labels,
Kueue will not add tolerations for the flavor taints.
workload should have a toleration for it. As opposed to the behavior for
[ResourceFlavor labels](#resourceflavor-labels), Kueue will not add tolerations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[ResourceFlavor labels](#resourceflavor-labels), Kueue will not add tolerations
[ResourceFlavor labels](#resourceflavor-labels), Kueue does not add tolerations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for the flavor taints.

### Empty ResourceFlavor

Expand Down Expand Up @@ -173,18 +238,18 @@ ClusterQueue.

### Flavors and borrowing semantics

When borrowing, Kueue satisfies the following semantics:
When borrowing, Kueue satisfies the following admission semantics:

- When assigning flavors, Kueue goes through the list of flavors in
`.spec.resources[*].flavors`. For each flavor, Kueue attempts to
fit the workload using the min quota of the ClusterQueue or the unused
min quota of other ClusterQueues in the cohort, up to the max quota of the
ClusterQueue. If the workload doesn't fit, Kueue proceeds evaluating the next
- When assigning flavors, Kueue goes through the list of flavors in the
ClusterQueue's `.spec.resources[*].flavors`. For each flavor, Kueue attempts
to fit a Workload's pod set using the `min` quota of the ClusterQueue or the
unused `min` quota of other ClusterQueues in the cohort, up to the `max` quota
of the ClusterQueue. If the workload doesn't fit, Kueue proceeds evaluating the next
flavor in the list.
Comment on lines +243 to 248
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- When assigning flavors, Kueue goes through the list of flavors in the
ClusterQueue's `.spec.resources[*].flavors`. For each flavor, Kueue attempts
to fit a Workload's pod set using the `min` quota of the ClusterQueue or the
unused `min` quota of other ClusterQueues in the cohort, up to the `max` quota
of the ClusterQueue. If the workload doesn't fit, Kueue proceeds evaluating the next
flavor in the list.
- When assigning flavors, Kueue goes through the list of flavors in the
ClusterQueue's `.spec.resources[*].flavors`. For each flavor, Kueue attempts
to fit a Workload's pod set using the `min` quota of the ClusterQueue or the
unused `min` quota of other ClusterQueues in the cohort. If the workload doesn't fit, Kueue proceeds evaluating the next flavor in the list. Kueue attempts to fit workloads until the `max` quota
of the ClusterQueue is reached.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely reworked this section

- Borrowing happens per-flavor. A ClusterQueue can only borrow quota of flavors
it defines.
- A ClusterQueue can only borrow quota of flavors it defines and it can only
borrow quota for one flavor.
Comment on lines +249 to +250
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- A ClusterQueue can only borrow quota of flavors it defines and it can only
borrow quota for one flavor.
- A ClusterQueue can only borrow quota of flavors that ClusterQueue defines. ClusterQueue can only
borrow quota for one flavor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mention "and it can only borrow quota for one flavor" because that also applies in the no borrowing case as well.


### Example
### Borrowing example

Assume you created the following two ClusterQueues:

Expand Down
17 changes: 17 additions & 0 deletions docs/concepts/local_queue.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Local Queue

A `LocalQueue` is a namespaced object that groups closely related workloads
belonging to a single tenant. A `LocalQueue` points to one [`ClusterQueue`](cluster_queue.md)
from which resources are allocated to run its workloads.

Users submit jobs to a `LocalQueue`, instead of directly to a `ClusterQueue`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Users submit jobs to a `LocalQueue`, instead of directly to a `ClusterQueue`.
Users submit jobs to a `LocalQueue`, instead of to a `ClusterQueue` directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Tenants can discover which queues they can submit jobs to by listing the
local queues in their namespace. The command looks similar to the following:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
local queues in their namespace. The command looks similar to the following:
local queues in their namespace. The command is similar to the following:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


```sh
kubectl get -n my-namespace localqueues
# Alternatively, use the alias `queue` or `queues`
kubectl get -n my-namespace queues
```

`queue` and `queues` are aliases for `localqueue`.
9 changes: 0 additions & 9 deletions docs/concepts/queue.md

This file was deleted.

13 changes: 11 additions & 2 deletions docs/concepts/workload.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ metadata:
name: sample-job
namespace: default
spec:
queueName: user-queue
podSets:
- count: 3
name: main
Expand All @@ -36,9 +37,13 @@ spec:
cpu: "1"
memory: 200Mi
restartPolicy: Never
queueName: user-queue
```

## Queue name

To indicate in which [LocalQueue](local_queue.md) you want your Workload to be
enqueued, set the name of the LocalQueue in the `.spec.queueName` field.

## Pod sets

A Workload might be composed of multiple Pods with different pod specs.
Expand All @@ -63,4 +68,8 @@ of the Job's pod template.

As described previously, Kueue has built-in support for workloads created with
the Job API. But any custom workload API can integrate with Kueue by
creating a corresponding Workload object for it.
creating a corresponding Workload object for it.

## What's next

- Learn how to [run jobs](/docs/tasks/run_jobs.md).
14 changes: 14 additions & 0 deletions docs/setup/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,20 @@ to scrape metrics from kueue components, run the following command:
kubectl apply -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/prometheus.yaml
```

### Uninstall

To uninstall a released version of Kueue from your cluster, run the following command:

```shell
VERSION=v0.1.1
kubectl delete -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/manifests.yaml
```

### Upgrading from 0.1 to 0.2

Upgrading from `0.1.x` to `0.2.y` is not supported due to breaking API changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Upgrading from `0.1.x` to `0.2.y` is not supported due to breaking API changes.
Upgrading from `0.1.x` to `0.2.y` is not supported because of breaking API changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

To install Kueue `0.2.y`, [uninstall](#uninstall) the older version first.

## Install a custom-configured released version

To install a custom-configured released version of Kueue in your cluster, execute the following steps:
Expand Down
8 changes: 4 additions & 4 deletions docs/tasks/administer_cluster_quotas.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,28 +93,28 @@ kubectl apply -f default-flavor.yaml
The `.metadata.name` matches the `.spec.resources[*].flavors[0].resourceFlavor`
field in the ClusterQueue.

### 3. Create [Queues](/docs/concepts/queue.md)
### 3. Create [LocalQueues](/docs/concepts/local_queue.md)

Users cannot directly send [workloads](/docs/concepts/workload.md) to
ClusterQueues. Instead, users need to send their workloads to a Queue in their
namespace.
Thus, for the queuing system to be complete, you need to create a Queue in
each namespace that needs access to the ClusterQueue.

Write the manifest for the Queue. It should look similar to the following:
Write the manifest for the LocalQueue. It should look similar to the following:

```yaml
# default-user-queue.yaml
apiVersion: kueue.x-k8s.io/v1alpha1
kind: Queue
kind: LocalQueue
metadata:
namespace: default
name: user-queue
spec:
clusterQueue: cluster-total
```

To create the Queue, run the following command:
To create the LocalQueue, run the following command:

```shell
kubectl apply -f default-user-queue.yaml
Expand Down
2 changes: 2 additions & 0 deletions docs/tasks/run_jobs.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ Make sure the following conditions are met:
Run the following command to list the Queues available in your namespace.

```shell
kubectl -n default get localqueues
# Or use the 'queues' alias.
kubectl -n default get queues
```

Expand Down