Skip to content

Commit

Permalink
Enhance and update documentation
Browse files Browse the repository at this point in the history
Change-Id: I25de039eea653e95a712f9d8450f14e77f16452f
  • Loading branch information
alculquicondor committed Aug 24, 2022
1 parent f98587f commit cb04515
Show file tree
Hide file tree
Showing 10 changed files with 188 additions and 54 deletions.
21 changes: 19 additions & 2 deletions CHANGELOG/CHANGELOG-0.2.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,25 @@

Changes since `v0.1.0`:

- Fixed bug in a BestEffortFIFO ClusterQueue where a workload might not be
retried after a transient error.
### Features
- Bumped the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported and Queue is now named LocalQueue.
- Add webhooks to validate and add defaults to all kueue APIs.
- Support [codependent resources](/docs/concepts/cluster_queue.md#codepedent-resources)
by assigning the same flavor to codependent resources in a pod set.
- Support [pod overhead](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/)
in Workload pod sets.
- Default requests to limits if requests are not set in a Workload pod set, to
match internal defaulting for k8s Pods.
- Added [prometheus metrics](/docs/reference/metrics.md) to monitor health of
the system and the status of ClusterQueues.

### Bug fixes

- Prevent Workloads that don't match the ClusterQueue's namespaceSelector from
blocking other Workloads in a StrictFIFO ClusterQueue.
- Fixed number of pending workloads in a BestEffortFIFO ClusterQueue.
- Fixed bug in a BestEffortFIFO ClusterQueue where a workload might not be
retried after a transient error.
- Fixed requeuing an out-of-date workload when failed to admit it.
- Fixed bug in a BestEffortFIFO ClusterQueue where unadmissible workloads
were not removed from the ClusterQueue when removing the corresponding Queue.
34 changes: 26 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,27 @@ Kueue is a set of APIs and controller for [job](docs/concepts/workload.md)
a job should be [admitted](docs/concepts#admission) to start (as in pods can be
created) and when it should stop (as in active pods should be deleted).

## Why use Kueue

Kueue is a lean controller that you can install on top of a vanilla Kubernetes
cluster without replacing any components. It is compatible with cloud
environments where:
- Nodes and other compute resources can be scaled up and down.
- Compute resources are heterogeneous (in architecture, availability, price, etc.).

Kueue APIs allow you to express:
- Quotas and policies for fair sharing among tenants.
- Resource fungibility: if a [resource flavor](docs/concepts/cluster_queue.md#resourceflavor-object)
is fully utilized, run the [job](docs/concepts/workload.md) using a different
flavor.

The main design principle for Kueue is to avoid duplicating mature functionality
in [Kubernetes components](https://kubernetes.io/docs/concepts/overview/components/)
and well-established third-party controllers. Autoscaling, pod-to-node scheduling and
job lifecycle management are the responsibility of cluster-autoscaler,
kube-scheduler and kube-controller-manager, respectively. Advanced
admission control can be delegated to controllers such as [gatekeeper](https://github.com/open-policy-agent/gatekeeper).

<!-- TODO(#64) Remove links to google docs once the contents have been migrated to this repo -->
Learn more by reading the design docs:
- [bit.ly/kueue-apis](https://bit.ly/kueue-apis) (please join the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-batch)
to get access) discusses the API proposal and a high-level description of how it
operates.
- [bit.ly/kueue-controller-design](https://bit.ly/kueue-controller-design)
presents the detailed design of the controller.

## Installation

**Requires Kubernetes 1.22 or newer**.
Expand Down Expand Up @@ -52,6 +58,18 @@ Learn more about:
- Kueue [concepts](docs/concepts).
- Common and advanced [tasks](docs/tasks).

## Architecture

<!-- TODO(#64) Remove links to google docs once the contents have been migrated to this repo -->

Learn more about the architecture of Kueue in the design docs:

- [bit.ly/kueue-apis](https://bit.ly/kueue-apis) (please join the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-batch)
to get access) discusses the API proposal and a high-level description of how it
operates.
- [bit.ly/kueue-controller-design](https://bit.ly/kueue-controller-design)
presents the detailed design of the controller.

## Community, discussion, contribution, and support

Learn how to engage with the Kubernetes community on the [community page](http://kubernetes.io/community/).
Expand Down
11 changes: 6 additions & 5 deletions docs/concepts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ abstractions that Kueue uses to represent your cluster and workloads.
A cluster-scoped resource that governs a pool of resources, defining usage
limits and fair sharing rules.

### [Queue](queue.md)
### [Local Queue](local_queue.md)

A namespaced resource that groups closely related workloads belonging to a
single tenant.
Expand All @@ -30,11 +30,12 @@ models, etc.

### Admission

The process of admitting a workload to start (pods to be created). A workload
The process of admitting a Workload to start (pods to be created). A Workload
is admitted by a ClusterQueue according to the available resources and gets
resource flavors assigned for each requested resource. Sometimes referred to
as _workload scheduling_ or _job scheduling_ (not to be confused with
[pod scheduling](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/)).
resource flavors assigned for each requested resource.

Sometimes referred to as _workload scheduling_ or _job scheduling_
(not to be confused with [pod scheduling](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/)).

### [Cohort](cluster_queue.md#cohort)

Expand Down
113 changes: 89 additions & 24 deletions docs/concepts/cluster_queue.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# Cluster Queue

A `ClusterQueue` is a cluster-scoped object that governs a pool of resources
A ClusterQueue is a cluster-scoped object that governs a pool of resources
such as CPU, memory and hardware accelerators. A `ClusterQueue` defines:
- The resource _flavors_ that it manages, with usage limits and order of consumption.
- The [resource _flavors_](#resourceflavor-object) that it manages, with usage
limits and order of consumption.
- Fair sharing rules across the tenants of the cluster.

Only [cluster administrators](/docs/tasks#batch-administrator) should create `ClusterQueue` objects.
Expand Down Expand Up @@ -35,6 +36,74 @@ This ClusterQueue admits [workloads](workload.md) if and only if:
You can specify the quota as a [quantity](https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/).
## Resources
In a ClusterQueue, you can define quotas for multiple [compute resources](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#resource-types)
(cpu, memory, GPUs, etc.).
For each resource, you can define quotas for multiple _flavors_. A
flavor represents different variations of a resource. The variations can be
defined in a [ResourceFlavor object](#resourceflavor-object).
In a process called [admission](.#admission), Kueue assigns
[Workload pod sets](workload.md#pod-sets) a flavor for each resource it requests.
Kueue assigns the first flavor in the ClusterQueue's `.spec.resources[*].flavors`
list that has enough unused `min` quota in the ClusterQueue or the
ClusterQueue's [cohort](#cohort).

### Codepedent resources

It is possible that multiple resources are tied to the same flavors. This is
typical for `cpu` and `memory`, where the flavors are generally tied to a
machine family or availability guarantees.

If this is the case, the resources in the ClusterQueue must list the same
flavors in the same order. When two or more resources match their flavors,
they are said to be codependent. During admission, for each pod set in a
Workload, Kueue assigns the same flavor to the codependent resources that the
pod set requests.

An example of a ClusterQueue with codependent resources looks like the following:

```yaml
apiVersion: kueue.x-k8s.io/v1alpha1
kind: ClusterQueue
metadata:
name: cluster-total
spec:
namespaceSelector: {}
resources:
- name: "cpu"
flavors:
- name: spot
quota:
min: 18
- name: on_demand
quota:
min: 9
- name: "memory"
flavors:
- name: spot
quota:
min: 72Gi
- name: on_demand
quota:
min: 36Gi
- name: "gpu"
flavors:
- name: vendor1
quota:
min: 10
- name: vendor2
quota:
min: 10
```

In the example above, `cpu` and `memory` are codependent resources, while `gpu`
is independent.

If two resources are not codependent, they must not have any flavors in common.

## Namespace selector

You can limit which namespaces can have workloads admitted in the ClusterQueue
Expand Down Expand Up @@ -81,7 +150,7 @@ Resources in a cluster are typically not homogeneous. Resources could differ in:
- architecture (ex: x86 vs ARM CPUs)
- brands and models (ex: Radeon 7000 vs Nvidia A100 vs T4 GPUs)

A `ResourceFlavor` is an object that represents these variations and allows you
A ResourceFlavor is an object that represents these variations and allows you
to associate them with node labels and taints.

**Note**: If your cluster is homogeneous, you can use an [empty ResourceFlavor](#empty-resourceflavor)
Expand All @@ -102,13 +171,8 @@ taints:
value: "true"
```

You can use the `.metadata.name` to reference a flavor from a ClusterQueue in
the `.spec.resources[*].flavors[*].name` field.

For each resource of each [pod set](workload.md#pod-sets) in a Workload, Kueue
assigns the first flavor in the `.spec.resources[*].flavors`
list that has enough unused quota in the ClusterQueue or the ClusterQueue's
[cohort](#cohort).
You can use the `.metadata.name` to reference a ResourceFlavor from a
ClusterQueue in the `.spec.resources[*].flavors[*].name` field.

### ResourceFlavor labels

Expand All @@ -132,9 +196,9 @@ steps:
didn't specify them already.

For example, for a [batch/v1.Job](https://kubernetes.io/docs/concepts/workloads/controllers/job/),
Kueue adds the labels to `.spec.template.spec.nodeSelector`. This guarantees
that the workload Pods run on the nodes associated to the flavor that Kueue
decided that the workload should use.
Kueue adds the labels to the `.spec.template.spec.nodeSelector` field. This
guarantees that the workload Pods run on the nodes associated to the flavor
that Kueue decided that the workload should use.

### ResourceFlavor taints

Expand All @@ -143,8 +207,9 @@ with taints.

Taints on the ResourceFlavor work similarly to [node taints](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/).
For Kueue to admit a workload to use the ResourceFlavor, the PodSpecs in the
workload should have a toleration for it. As opposed to ResourceFlavor labels,
Kueue will not add tolerations for the flavor taints.
workload should have a toleration for it. As opposed to the behavior for
[ResourceFlavor labels](#resourceflavor-labels), Kueue will not add tolerations
for the flavor taints.

### Empty ResourceFlavor

Expand Down Expand Up @@ -173,18 +238,18 @@ ClusterQueue.

### Flavors and borrowing semantics

When borrowing, Kueue satisfies the following semantics:
When borrowing, Kueue satisfies the following admission semantics:

- When assigning flavors, Kueue goes through the list of flavors in
`.spec.resources[*].flavors`. For each flavor, Kueue attempts to
fit the workload using the min quota of the ClusterQueue or the unused
min quota of other ClusterQueues in the cohort, up to the max quota of the
ClusterQueue. If the workload doesn't fit, Kueue proceeds evaluating the next
- When assigning flavors, Kueue goes through the list of flavors in the
ClusterQueue's `.spec.resources[*].flavors`. For each flavor, Kueue attempts
to fit a Workload's pod set using the `min` quota of the ClusterQueue or the
unused `min` quota of other ClusterQueues in the cohort, up to the `max` quota
of the ClusterQueue. If the workload doesn't fit, Kueue proceeds evaluating the next
flavor in the list.
- Borrowing happens per-flavor. A ClusterQueue can only borrow quota of flavors
it defines.
- A ClusterQueue can only borrow quota of flavors it defines and it can only
borrow quota for one flavor.

### Example
### Borrowing example

Assume you created the following two ClusterQueues:

Expand Down
17 changes: 17 additions & 0 deletions docs/concepts/local_queue.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Local Queue

A `LocalQueue` is a namespaced object that groups closely related workloads
belonging to a single tenant. A `LocalQueue` points to one [`ClusterQueue`](cluster_queue.md)
from which resources are allocated to run its workloads.

Users submit jobs to a `LocalQueue`, instead of directly to a `ClusterQueue`.
Tenants can discover which queues they can submit jobs to by listing the
local queues in their namespace. The command looks similar to the following:

```sh
kubectl get -n my-namespace localqueues
# Alternatively, use the alias `queue` or `queues`
kubectl get -n my-namespace queues
```

`queue` and `queues` are aliases for `localqueue`.
9 changes: 0 additions & 9 deletions docs/concepts/queue.md

This file was deleted.

13 changes: 11 additions & 2 deletions docs/concepts/workload.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ metadata:
name: sample-job
namespace: default
spec:
queueName: user-queue
podSets:
- count: 3
name: main
Expand All @@ -36,9 +37,13 @@ spec:
cpu: "1"
memory: 200Mi
restartPolicy: Never
queueName: user-queue
```
## Queue name
To indicate in which [LocalQueue](local_queue.md) you want your Workload to be
enqueued, set the name of the LocalQueue in the `.spec.queueName` field.

## Pod sets

A Workload might be composed of multiple Pods with different pod specs.
Expand All @@ -63,4 +68,8 @@ of the Job's pod template.

As described previously, Kueue has built-in support for workloads created with
the Job API. But any custom workload API can integrate with Kueue by
creating a corresponding Workload object for it.
creating a corresponding Workload object for it.

## What's next

- Learn how to [run jobs](/docs/tasks/run_jobs.md).
14 changes: 14 additions & 0 deletions docs/setup/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,20 @@ to scrape metrics from kueue components, run the following command:
kubectl apply -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/prometheus.yaml
```

### Uninstall

To uninstall a released version of Kueue from your cluster, run the following command:

```shell
VERSION=v0.1.1
kubectl delete -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/manifests.yaml
```

### Upgrading from 0.1 to 0.2

Upgrading from `0.1.x` to `0.2.y` is not supported due to breaking API changes.
To install Kueue `0.2.y`, [uninstall](#uninstall) the older version first.

## Install a custom-configured released version

To install a custom-configured released version of Kueue in your cluster, execute the following steps:
Expand Down
8 changes: 4 additions & 4 deletions docs/tasks/administer_cluster_quotas.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,28 +93,28 @@ kubectl apply -f default-flavor.yaml
The `.metadata.name` matches the `.spec.resources[*].flavors[0].resourceFlavor`
field in the ClusterQueue.

### 3. Create [Queues](/docs/concepts/queue.md)
### 3. Create [LocalQueues](/docs/concepts/local_queue.md)

Users cannot directly send [workloads](/docs/concepts/workload.md) to
ClusterQueues. Instead, users need to send their workloads to a Queue in their
namespace.
Thus, for the queuing system to be complete, you need to create a Queue in
each namespace that needs access to the ClusterQueue.

Write the manifest for the Queue. It should look similar to the following:
Write the manifest for the LocalQueue. It should look similar to the following:

```yaml
# default-user-queue.yaml
apiVersion: kueue.x-k8s.io/v1alpha1
kind: Queue
kind: LocalQueue
metadata:
namespace: default
name: user-queue
spec:
clusterQueue: cluster-total
```
To create the Queue, run the following command:
To create the LocalQueue, run the following command:
```shell
kubectl apply -f default-user-queue.yaml
Expand Down
2 changes: 2 additions & 0 deletions docs/tasks/run_jobs.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ Make sure the following conditions are met:
Run the following command to list the Queues available in your namespace.

```shell
kubectl -n default get localqueues
# Or use the 'queues' alias.
kubectl -n default get queues
```

Expand Down

0 comments on commit cb04515

Please sign in to comment.