Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance and update documentation #351

Merged
merged 1 commit into from
Aug 24, 2022

Conversation

alculquicondor
Copy link
Contributor

@alculquicondor alculquicondor commented Aug 23, 2022

What type of PR is this?

/kind documentation

What this PR does / why we need it:

Update documentation and changelog highlighting the main features and bugfixes.

Which issue(s) this PR fixes:

Part of #222

Special notes for your reviewer:

@k8s-ci-robot k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 23, 2022
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 23, 2022
@alculquicondor
Copy link
Contributor Author

/assign @ahg-g

cc @cortespao

@@ -2,6 +2,23 @@

Changes since `v0.1.0`:

### Features
- Bumped the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported and Queue is now named LocalQueue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Bumped the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported and Queue is now named LocalQueue.
- Upgrade the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported.
- Rename `Queue` to `LocalQueue`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, except we don't do ticks for API objects in k8s.io

@@ -2,6 +2,23 @@

Changes since `v0.1.0`:

### Features
- Bumped the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported and Queue is now named LocalQueue.
- Add webhooks to validate and add defaults to all kueue APIs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Add webhooks to validate and add defaults to all kueue APIs.
- Add webhooks to validate, and add defaults to all kueue APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the webhooks validate and add defaults.

How's the new wording?

### Bug fixes

- Prevent Workloads that don't match the ClusterQueue's namespaceSelector from
blocking other Workloads in a StrictFIFO ClusterQueue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is a feature or a bug fix.
If it's a bug fix, it would be good to reword this in the form of:
"Fixed bug that prevented workloads..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


<!-- TODO(#64) Remove links to google docs once the contents have been migrated to this repo -->

Learn more about the architecture of Kueue in the design docs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Learn more about the architecture of Kueue in the design docs:
Learn more about the architecture of Kueue with the following design docs:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


Kueue APIs allow you to express:
- Quotas and policies for fair sharing among tenants.
- Resource fungibility: if a [resource flavor](docs/concepts/cluster_queue.md#resourceflavor-object)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Resource fungibility: if a [resource flavor](docs/concepts/cluster_queue.md#resourceflavor-object)
- Resource fungibility: if a [resource flavor](docs/concepts/cluster_queue.md#resourceflavor-object) is fully utilized, you can using a different flavor. For more information, see the [Workload objects](docs/concepts/workload.md).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for marking the changes on only one line 😞

Copy link
Contributor Author

@alculquicondor alculquicondor Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the subject. It's kueue that runs (or rather admit) the job in a different flavor. I think we don't need the link to workload again. It's in the first paragraph.

Comment on lines +243 to 248
- When assigning flavors, Kueue goes through the list of flavors in the
ClusterQueue's `.spec.resources[*].flavors`. For each flavor, Kueue attempts
to fit a Workload's pod set using the `min` quota of the ClusterQueue or the
unused `min` quota of other ClusterQueues in the cohort, up to the `max` quota
of the ClusterQueue. If the workload doesn't fit, Kueue proceeds evaluating the next
flavor in the list.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- When assigning flavors, Kueue goes through the list of flavors in the
ClusterQueue's `.spec.resources[*].flavors`. For each flavor, Kueue attempts
to fit a Workload's pod set using the `min` quota of the ClusterQueue or the
unused `min` quota of other ClusterQueues in the cohort, up to the `max` quota
of the ClusterQueue. If the workload doesn't fit, Kueue proceeds evaluating the next
flavor in the list.
- When assigning flavors, Kueue goes through the list of flavors in the
ClusterQueue's `.spec.resources[*].flavors`. For each flavor, Kueue attempts
to fit a Workload's pod set using the `min` quota of the ClusterQueue or the
unused `min` quota of other ClusterQueues in the cohort. If the workload doesn't fit, Kueue proceeds evaluating the next flavor in the list. Kueue attempts to fit workloads until the `max` quota
of the ClusterQueue is reached.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely reworked this section

Comment on lines +249 to +250
- A ClusterQueue can only borrow quota of flavors it defines and it can only
borrow quota for one flavor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- A ClusterQueue can only borrow quota of flavors it defines and it can only
borrow quota for one flavor.
- A ClusterQueue can only borrow quota of flavors that ClusterQueue defines. ClusterQueue can only
borrow quota for one flavor.

belonging to a single tenant. A `LocalQueue` points to one [`ClusterQueue`](cluster_queue.md)
from which resources are allocated to run its workloads.

Users submit jobs to a `LocalQueue`, instead of directly to a `ClusterQueue`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Users submit jobs to a `LocalQueue`, instead of directly to a `ClusterQueue`.
Users submit jobs to a `LocalQueue`, instead of to a `ClusterQueue` directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


Users submit jobs to a `LocalQueue`, instead of directly to a `ClusterQueue`.
Tenants can discover which queues they can submit jobs to by listing the
local queues in their namespace. The command looks similar to the following:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
local queues in their namespace. The command looks similar to the following:
local queues in their namespace. The command is similar to the following:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


### Upgrading from 0.1 to 0.2

Upgrading from `0.1.x` to `0.2.y` is not supported due to breaking API changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Upgrading from `0.1.x` to `0.2.y` is not supported due to breaking API changes.
Upgrading from `0.1.x` to `0.2.y` is not supported because of breaking API changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


- Prevent Workloads that don't match the ClusterQueue's namespaceSelector from
blocking other Workloads in a StrictFIFO ClusterQueue.
- Fixed number of pending workloads in a BestEffortFIFO ClusterQueue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Fixed number of pending workloads in a BestEffortFIFO ClusterQueue.
- Fixed the number of pending workloads in a BestEffortFIFO ClusterQueue status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


- Prevent Workloads that don't match the ClusterQueue's namespaceSelector from
blocking other Workloads in a StrictFIFO ClusterQueue.
- Fixed number of pending workloads in a BestEffortFIFO ClusterQueue.
- Fixed bug in a BestEffortFIFO ClusterQueue where a workload might not be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Fixed bug in a BestEffortFIFO ClusterQueue where a workload might not be
- Fixed a bug in BestEffortFIFO ClusterQueue where a workload might not be

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- Fixed bug in a BestEffortFIFO ClusterQueue where a workload might not be
retried after a transient error.
- Bumped the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported and Queue is now named LocalQueue.
- Fixed requeuing an out-of-date workload when failed to admit it.
- Fixed bug in a BestEffortFIFO ClusterQueue where unadmissible workloads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Fixed bug in a BestEffortFIFO ClusterQueue where unadmissible workloads
- Fixed a bug in BestEffortFIFO ClusterQueue where inadmissible workloads

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +44 to +46
For each resource, you can define quotas for multiple _flavors_. A
flavor represents different variations of a resource. The variations can be
defined in a [ResourceFlavor object](#resourceflavor-object).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For each resource, you can define quotas for multiple _flavors_. A
flavor represents different variations of a resource. The variations can be
defined in a [ResourceFlavor object](#resourceflavor-object).
For each resource, you can define quotas for multiple _flavors_. Flavors represent different variations of a resource (for example, different GPU models). A flavor can be defined using a [ResourceFlavor object](#resourceflavor-object).

Comment on lines +10 to +14
Kueue is a lean controller that you can install on top of a vanilla Kubernetes
cluster without replacing any components. It is compatible with cloud
environments where:
- Nodes and other compute resources can be scaled up and down.
- Compute resources are heterogeneous (in architecture, availability, price, etc.).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Kueue is a lean controller that you can install on top of a vanilla Kubernetes
cluster without replacing any components. It is compatible with cloud
environments where:
- Nodes and other compute resources can be scaled up and down.
- Compute resources are heterogeneous (in architecture, availability, price, etc.).
Kueue is a lean controller that you can install on top of a vanilla Kubernetes
cluster. It does not replace any existing Kubernetes components. It is compatible with cloud
environments where:
- Compute resources are elastic and can be scaled up and down.
- Compute resources are heterogeneous (in architecture, availability, price, etc.).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "heterogeneous" mean in this example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they have different characteristics

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean it is not compatible with environments where architecture is homogeneous?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is compatible, as homogeneous is just simpler. In kueue terms, if your resources are homegeneous, you need only one ResourceFlavor.


- [bit.ly/kueue-apis](https://bit.ly/kueue-apis) (please join the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-batch)
to get access) discusses the API proposal and a high-level description of how it
operates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep the bit.ly/ prefix, sometimes people highlight and want to copy the short link


It is possible that multiple resources are tied to the same flavors. This is
typical for `cpu` and `memory`, where the flavors are generally tied to a
machine family or availability guarantees.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
machine family or availability guarantees.
machine family or provisioning model (e.g., spot vs standard).

Comment on lines +153 to 154
A ResourceFlavor is an object that represents these variations and allows you
to associate them with node labels and taints.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A ResourceFlavor is an object that represents these variations and allows you
to associate them with node labels and taints.
A ResourceFlavor is an object that represents these variations and allows you
to associate them with the labels and taints of the nodes that offer those resource flavors.

Comment on lines +199 to +201
Kueue adds the labels to the `.spec.template.spec.nodeSelector` field. This
guarantees that the workload Pods run on the nodes associated to the flavor
that Kueue decided that the workload should use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Kueue adds the labels to the `.spec.template.spec.nodeSelector` field. This
guarantees that the workload Pods run on the nodes associated to the flavor
that Kueue decided that the workload should use.
Kueue adds the labels to the `.spec.template.spec.nodeSelector` field. This
guarantees that the workload Pods can only be scheduled on the nodes targeted by the flavor
that Kueue assigned to the workload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, I was having a hard time with this one :)

- Borrowing happens per-flavor. A ClusterQueue can only borrow quota of flavors
it defines.
- A ClusterQueue can only borrow quota of flavors it defines and it can only
borrow quota for one flavor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mention "and it can only borrow quota for one flavor" because that also applies in the no borrowing case as well.

@alculquicondor alculquicondor mentioned this pull request Aug 24, 2022
20 tasks
Change-Id: I25de039eea653e95a712f9d8450f14e77f16452f
@alculquicondor
Copy link
Contributor Author

/label tide/merge-method-squash
/hold for reviews

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. labels Aug 24, 2022
README.md Outdated
## Why use Kueue

Kueue is a lean controller that you can install on top of a vanilla Kubernetes
cluster. Kue does not replace any existing Kubernetes components. Kueue is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cluster. Kue does not replace any existing Kubernetes components. Kueue is
cluster. Kueue does not replace any existing Kubernetes components. Kueue is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, how did that happen?
Although not a bad name (it's basically pronounced the same).

@ahg-g
Copy link
Contributor

ahg-g commented Aug 24, 2022

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 24, 2022
Copy link
Contributor

@cortespao cortespao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just nits.
Docs look awesome. Thanks all!

by assigning the same flavor to codependent resources in a pod set.
- Support [pod overhead](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/)
in Workload pod sets.
- Default requests to limits if requests are not set in a Workload pod set, to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Default requests to limits if requests are not set in a Workload pod set, to
- Define default requests to limits if requests are not set in a Workload pod set, to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Set is a more accurate verb:

Set requests to limits

wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to set and added a link to k8s reference doc.

- Support [pod overhead](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/)
in Workload pod sets.
- Default requests to limits if requests are not set in a Workload pod set, to
match internal defaulting for k8s Pods.
- Added [prometheus metrics](/docs/reference/metrics.md) to monitor health of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Added [prometheus metrics](/docs/reference/metrics.md) to monitor health of
- Add [prometheus metrics](/docs/reference/metrics.md) to monitor health of

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, cortespao

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 24, 2022
@alculquicondor
Copy link
Contributor Author

Updated and squashed

@alculquicondor
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 24, 2022
@ahg-g
Copy link
Contributor

ahg-g commented Aug 24, 2022

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 24, 2022
@k8s-ci-robot k8s-ci-robot merged commit 5839bd3 into kubernetes-sigs:main Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants