-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance and update documentation #351
Conversation
/assign @ahg-g cc @cortespao |
7c85ad5
to
943db4b
Compare
@@ -2,6 +2,23 @@ | |||
|
|||
Changes since `v0.1.0`: | |||
|
|||
### Features | |||
- Bumped the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported and Queue is now named LocalQueue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Bumped the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported and Queue is now named LocalQueue. | |
- Upgrade the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported. | |
- Rename `Queue` to `LocalQueue`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, except we don't do ticks for API objects in k8s.io
@@ -2,6 +2,23 @@ | |||
|
|||
Changes since `v0.1.0`: | |||
|
|||
### Features | |||
- Bumped the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported and Queue is now named LocalQueue. | |||
- Add webhooks to validate and add defaults to all kueue APIs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Add webhooks to validate and add defaults to all kueue APIs. | |
- Add webhooks to validate, and add defaults to all kueue APIs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the webhooks validate and add defaults.
How's the new wording?
### Bug fixes | ||
|
||
- Prevent Workloads that don't match the ClusterQueue's namespaceSelector from | ||
blocking other Workloads in a StrictFIFO ClusterQueue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is a feature or a bug fix.
If it's a bug fix, it would be good to reword this in the form of:
"Fixed bug that prevented workloads..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
<!-- TODO(#64) Remove links to google docs once the contents have been migrated to this repo --> | ||
|
||
Learn more about the architecture of Kueue in the design docs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Learn more about the architecture of Kueue in the design docs: | |
Learn more about the architecture of Kueue with the following design docs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
Kueue APIs allow you to express: | ||
- Quotas and policies for fair sharing among tenants. | ||
- Resource fungibility: if a [resource flavor](docs/concepts/cluster_queue.md#resourceflavor-object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Resource fungibility: if a [resource flavor](docs/concepts/cluster_queue.md#resourceflavor-object) | |
- Resource fungibility: if a [resource flavor](docs/concepts/cluster_queue.md#resourceflavor-object) is fully utilized, you can using a different flavor. For more information, see the [Workload objects](docs/concepts/workload.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for marking the changes on only one line 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the subject. It's kueue that runs (or rather admit) the job in a different flavor. I think we don't need the link to workload again. It's in the first paragraph.
- When assigning flavors, Kueue goes through the list of flavors in the | ||
ClusterQueue's `.spec.resources[*].flavors`. For each flavor, Kueue attempts | ||
to fit a Workload's pod set using the `min` quota of the ClusterQueue or the | ||
unused `min` quota of other ClusterQueues in the cohort, up to the `max` quota | ||
of the ClusterQueue. If the workload doesn't fit, Kueue proceeds evaluating the next | ||
flavor in the list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- When assigning flavors, Kueue goes through the list of flavors in the | |
ClusterQueue's `.spec.resources[*].flavors`. For each flavor, Kueue attempts | |
to fit a Workload's pod set using the `min` quota of the ClusterQueue or the | |
unused `min` quota of other ClusterQueues in the cohort, up to the `max` quota | |
of the ClusterQueue. If the workload doesn't fit, Kueue proceeds evaluating the next | |
flavor in the list. | |
- When assigning flavors, Kueue goes through the list of flavors in the | |
ClusterQueue's `.spec.resources[*].flavors`. For each flavor, Kueue attempts | |
to fit a Workload's pod set using the `min` quota of the ClusterQueue or the | |
unused `min` quota of other ClusterQueues in the cohort. If the workload doesn't fit, Kueue proceeds evaluating the next flavor in the list. Kueue attempts to fit workloads until the `max` quota | |
of the ClusterQueue is reached. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely reworked this section
- A ClusterQueue can only borrow quota of flavors it defines and it can only | ||
borrow quota for one flavor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- A ClusterQueue can only borrow quota of flavors it defines and it can only | |
borrow quota for one flavor. | |
- A ClusterQueue can only borrow quota of flavors that ClusterQueue defines. ClusterQueue can only | |
borrow quota for one flavor. |
belonging to a single tenant. A `LocalQueue` points to one [`ClusterQueue`](cluster_queue.md) | ||
from which resources are allocated to run its workloads. | ||
|
||
Users submit jobs to a `LocalQueue`, instead of directly to a `ClusterQueue`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users submit jobs to a `LocalQueue`, instead of directly to a `ClusterQueue`. | |
Users submit jobs to a `LocalQueue`, instead of to a `ClusterQueue` directly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
Users submit jobs to a `LocalQueue`, instead of directly to a `ClusterQueue`. | ||
Tenants can discover which queues they can submit jobs to by listing the | ||
local queues in their namespace. The command looks similar to the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local queues in their namespace. The command looks similar to the following: | |
local queues in their namespace. The command is similar to the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
### Upgrading from 0.1 to 0.2 | ||
|
||
Upgrading from `0.1.x` to `0.2.y` is not supported due to breaking API changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upgrading from `0.1.x` to `0.2.y` is not supported due to breaking API changes. | |
Upgrading from `0.1.x` to `0.2.y` is not supported because of breaking API changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
- Prevent Workloads that don't match the ClusterQueue's namespaceSelector from | ||
blocking other Workloads in a StrictFIFO ClusterQueue. | ||
- Fixed number of pending workloads in a BestEffortFIFO ClusterQueue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Fixed number of pending workloads in a BestEffortFIFO ClusterQueue. | |
- Fixed the number of pending workloads in a BestEffortFIFO ClusterQueue status. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
- Prevent Workloads that don't match the ClusterQueue's namespaceSelector from | ||
blocking other Workloads in a StrictFIFO ClusterQueue. | ||
- Fixed number of pending workloads in a BestEffortFIFO ClusterQueue. | ||
- Fixed bug in a BestEffortFIFO ClusterQueue where a workload might not be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Fixed bug in a BestEffortFIFO ClusterQueue where a workload might not be | |
- Fixed a bug in BestEffortFIFO ClusterQueue where a workload might not be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
- Fixed bug in a BestEffortFIFO ClusterQueue where a workload might not be | ||
retried after a transient error. | ||
- Bumped the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported and Queue is now named LocalQueue. | ||
- Fixed requeuing an out-of-date workload when failed to admit it. | ||
- Fixed bug in a BestEffortFIFO ClusterQueue where unadmissible workloads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Fixed bug in a BestEffortFIFO ClusterQueue where unadmissible workloads | |
- Fixed a bug in BestEffortFIFO ClusterQueue where inadmissible workloads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
For each resource, you can define quotas for multiple _flavors_. A | ||
flavor represents different variations of a resource. The variations can be | ||
defined in a [ResourceFlavor object](#resourceflavor-object). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For each resource, you can define quotas for multiple _flavors_. A | |
flavor represents different variations of a resource. The variations can be | |
defined in a [ResourceFlavor object](#resourceflavor-object). | |
For each resource, you can define quotas for multiple _flavors_. Flavors represent different variations of a resource (for example, different GPU models). A flavor can be defined using a [ResourceFlavor object](#resourceflavor-object). |
Kueue is a lean controller that you can install on top of a vanilla Kubernetes | ||
cluster without replacing any components. It is compatible with cloud | ||
environments where: | ||
- Nodes and other compute resources can be scaled up and down. | ||
- Compute resources are heterogeneous (in architecture, availability, price, etc.). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kueue is a lean controller that you can install on top of a vanilla Kubernetes | |
cluster without replacing any components. It is compatible with cloud | |
environments where: | |
- Nodes and other compute resources can be scaled up and down. | |
- Compute resources are heterogeneous (in architecture, availability, price, etc.). | |
Kueue is a lean controller that you can install on top of a vanilla Kubernetes | |
cluster. It does not replace any existing Kubernetes components. It is compatible with cloud | |
environments where: | |
- Compute resources are elastic and can be scaled up and down. | |
- Compute resources are heterogeneous (in architecture, availability, price, etc.). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "heterogeneous" mean in this example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they have different characteristics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean it is not compatible with environments where architecture is homogeneous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is compatible, as homogeneous is just simpler. In kueue terms, if your resources are homegeneous, you need only one ResourceFlavor.
|
||
- [bit.ly/kueue-apis](https://bit.ly/kueue-apis) (please join the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-batch) | ||
to get access) discusses the API proposal and a high-level description of how it | ||
operates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep the bit.ly/
prefix, sometimes people highlight and want to copy the short link
|
||
It is possible that multiple resources are tied to the same flavors. This is | ||
typical for `cpu` and `memory`, where the flavors are generally tied to a | ||
machine family or availability guarantees. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
machine family or availability guarantees. | |
machine family or provisioning model (e.g., spot vs standard). |
A ResourceFlavor is an object that represents these variations and allows you | ||
to associate them with node labels and taints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A ResourceFlavor is an object that represents these variations and allows you | |
to associate them with node labels and taints. | |
A ResourceFlavor is an object that represents these variations and allows you | |
to associate them with the labels and taints of the nodes that offer those resource flavors. |
Kueue adds the labels to the `.spec.template.spec.nodeSelector` field. This | ||
guarantees that the workload Pods run on the nodes associated to the flavor | ||
that Kueue decided that the workload should use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kueue adds the labels to the `.spec.template.spec.nodeSelector` field. This | |
guarantees that the workload Pods run on the nodes associated to the flavor | |
that Kueue decided that the workload should use. | |
Kueue adds the labels to the `.spec.template.spec.nodeSelector` field. This | |
guarantees that the workload Pods can only be scheduled on the nodes targeted by the flavor | |
that Kueue assigned to the workload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, I was having a hard time with this one :)
- Borrowing happens per-flavor. A ClusterQueue can only borrow quota of flavors | ||
it defines. | ||
- A ClusterQueue can only borrow quota of flavors it defines and it can only | ||
borrow quota for one flavor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't mention "and it can only borrow quota for one flavor" because that also applies in the no borrowing case as well.
Change-Id: I25de039eea653e95a712f9d8450f14e77f16452f
705df6d
to
e7e1013
Compare
/label tide/merge-method-squash |
README.md
Outdated
## Why use Kueue | ||
|
||
Kueue is a lean controller that you can install on top of a vanilla Kubernetes | ||
cluster. Kue does not replace any existing Kubernetes components. Kueue is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cluster. Kue does not replace any existing Kubernetes components. Kueue is | |
cluster. Kueue does not replace any existing Kubernetes components. Kueue is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, how did that happen?
Although not a bad name (it's basically pronounced the same).
e7e1013
to
692fbe2
Compare
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just nits.
Docs look awesome. Thanks all!
by assigning the same flavor to codependent resources in a pod set. | ||
- Support [pod overhead](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/) | ||
in Workload pod sets. | ||
- Default requests to limits if requests are not set in a Workload pod set, to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Default requests to limits if requests are not set in a Workload pod set, to | |
- Define default requests to limits if requests are not set in a Workload pod set, to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe Set is a more accurate verb:
Set requests to limits
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to set and added a link to k8s reference doc.
- Support [pod overhead](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/) | ||
in Workload pod sets. | ||
- Default requests to limits if requests are not set in a Workload pod set, to | ||
match internal defaulting for k8s Pods. | ||
- Added [prometheus metrics](/docs/reference/metrics.md) to monitor health of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Added [prometheus metrics](/docs/reference/metrics.md) to monitor health of | |
- Add [prometheus metrics](/docs/reference/metrics.md) to monitor health of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, cortespao The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
692fbe2
to
cb04515
Compare
Updated and squashed |
/hold cancel |
/lgtm |
What type of PR is this?
/kind documentation
What this PR does / why we need it:
Update documentation and changelog highlighting the main features and bugfixes.
Which issue(s) this PR fixes:
Part of #222
Special notes for your reviewer: