Skip to content

[KEP-5073] Update KEP with section of validation ratcheting#5292

Merged
k8s-ci-robot merged 3 commits intokubernetes:masterfrom
yongruilin:ratcheting
Jun 9, 2025
Merged

[KEP-5073] Update KEP with section of validation ratcheting#5292
k8s-ci-robot merged 3 commits intokubernetes:masterfrom
yongruilin:ratcheting

Conversation

@yongruilin
Copy link
Contributor

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels May 8, 2025
@k8s-ci-robot k8s-ci-robot requested review from apelisse and sttts May 8, 2025 17:01
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 8, 2025
@yongruilin yongruilin marked this pull request as ready for review May 8, 2025 17:45
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 8, 2025
@k8s-ci-robot k8s-ci-robot requested review from deads2k and jpbetz May 8, 2025 17:46
@yongruilin yongruilin force-pushed the ratcheting branch 2 times, most recently from 912ee4e to 34dae9a Compare May 8, 2025 18:26
@yongruilin yongruilin closed this May 8, 2025
@yongruilin yongruilin reopened this May 8, 2025
@yongruilin
Copy link
Contributor Author

/assign @jpbetz @deads2k

Copy link
Contributor

@deads2k deads2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question and a hopefully non-contentious request. lgtm otherwise

| Struct fields<br/>`structType=atomic` | - | revalidate changed fields | [Issue #131566](https://github.com/kubernetes/kubernetes/issues/131566) (Alignment needed) |
| Slices | all elements are equivalent and the is order unchanged | revalidate the slice if any element changed or order changed | `listType=map`: no validate when re-order<br/>`listType=set`: re-validate when re-order<br/>`listType=atomic`: re-validate when re-order |
| Slice values<br/>`listType=atomic` | - | validate items which are not found (by value) in the old slice | Validate all elements (CRDs ratcheting may be expanded to match in-tree ratcheting) |
| Slice values<br/>`listType=map` | - | (re)validate items which are not found (by key) in the old slice or are changed | same |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if order of a slice that is a map changes, but all the keys and values are the same, do we skip validation? I would expect us to skip.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would skip for the validation on the individual element. But should we still validate the listmap as a whole? If it is declared with "immutable", we consider the order as part of the API and validate the order?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would skip for the validation on the individual element. But should we still validate the listmap as a whole? If it is declared with "immutable", we consider the order as part of the API and validate the order?

Do you have an example? I think for our validation tag purposes I would expect immutable on a list of type map to mean the key/value pairs cannot change, but I don't think I'd expect revalidation on ordering. That means that certain kinds of validation cannot be expressed, but it's not so clear to me that we'd want to support validation on lists of type map that were order dependent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Container.Env
The order here matters as "Variables making use of others defined in the same context must come later in the list." (ref doc) This is maybe a mistake for marking it as listtype=map, but it is already in used widely.

Copy link
Contributor

@jpbetz jpbetz May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pod.spec.container[*].env is also the case I wanted to make sure I fully considered as I was when reading this thread.

env is problematic because it represents variable assignments (order matters,repeated "keys" are allowed). It breaks listType=map in quite a few ways.

Despite all of those problems, I'm not aware of any validation rules that would need to be reevaluated if the order changes for env. And I think that's what we're looking for here. If we can't find any actual cases. I'm inclined to not reevaluated validation when only order changes like @deads2k suggests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-followup:

I don't know of any places where the validation logic depends on the order of items in a list-map or list-set, and I am hard-pressed to synthesize a realistic one (I can dream up phony ones, though). env doesn't count here because it's REALLY NOT a map.

Likewise, I can't really dream up a case where a reorder-only update is a thing someone would want to do.

That makes this discussion somewhat academic, but it changes core principles so I think we need to pin it down.

For me it mostly comes down to consistency. If we write to storage and trigger admission and send WATCH events, then I think we should validate. You can argue that it's a map, so any validation that would fail on a reorder is broken, and I not even going to disagree.

The second concern is complexity. To skip validation in the event of a reorder-only update, we need to write more code and/or do slow things like maps and write more tests than if we just let the revalidation happen. I doubt very very much that reorder-only updates are a thing people do, so it's really moot.

Why write more code to skip a corner-case that doesn't really exist?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK here's a semi-real case.

Service.Spec.ClusterIPs (it's not declared as a set but it probably should be - it has to be unique elements). There's a rule that says that .clusterIPs[0] must == .clusterIP. If you reorder that list, validation SHOULD fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK here's a semi-real case.
Service.Spec.ClusterIPs (it's not declared as a set but it probably should be - it has to be unique elements). There's a rule that says that .clusterIPs[0] must == .clusterIP. If you reorder that list, validation SHOULD fail.

I think a distinguishing feature of a set or a map is that the order doesn't matter. Re-validating changes in those cases allows people to write not-map and not-set validation. For ClusterIPs, I counter-claim that someone reading the type, seeing set, treating it as a set, and then getting validation errors incompatible with a set is a bug in the type description and a kind of bug we can make impossible to write by not revalidating.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've got a variety of semantics in the use cases. Regardless of what we pick for the default ratcheting behavior of listType=map, we're going to have to deal with the use cases that don't align.

I don't love that we've accumulated order dependent use cases on listType=map. But I can see how we got here:

  • We alphanumerically sort JSON maps when we serialize, but we preserve the order of listType=map.
  • When types were annotated as "maps" by SMD and SSA, no special considerations were made for lists where order matters.

If we continue to force uniform treatment of all listType=map entries have the same semantics, I don't see the situation getting a whole lot better.

Would it be reasonable to do something like this?

  • Clarify (presumably with more tags) all use cases of listType=maps where order matters.
  • Once we've divided the use cases up, provide sane default ratcheting for each category.

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assert that these are not "maps" (or sets), they are "kubernetes list-maps", so the semantics are really up to us to define.

We could define different attributes like "// +listType=set" vs "// +unique". Maps are more fiddly since the way we define keys is called listMapKey. I suppose we could do "orderedSet" and "orderedMap", but I think we would use the ordered variants for all existing-field conversions, just out of fear of breaking clients' assumptions.

If we can find better ways to express things like env which doesn't break clients but also doesn't carry untrue implications, great! But that's not what this is about. As it stands, env is a bad candidate for being converted to declarative :)

Let's assume we filter out all the things that are declared as maps/sets but really aren't. For the rest: would you be open to changes which provide "real" map semantics? IOW: don't write to disk (but do return 200), sort the elements by key (e.g. in JSON/YAML), don't trigger WATCH, don't call admission, etc? I suspect that will 100% break someone.

IMO consistency and simplicity are important. Pragmatically, less code has less bugs. I don't see the value in supporting "map" and "orderedMap" as different things.

Comment on lines +1071 to +1479
A challenge arises if a cross-field validation rule (e.g. `X < Y`) is defined on a common ancestor struct, and an unrelated field (e.g. `Z`) within that same ancestor is modified. This change to `Z` makes the common ancestor “changed” overall, triggering re-validation of the `X < Y` rule. If this rule was recently evolved (e.g., made stricter), it might now fail even if `X` and `Y` themselves are not modified by the user’s update. This could violate the principle “Unchanged fields do not cause update rejections.”

For the initial implementation, this behavior will be documented. Furthermore, cross-field validation rules themselves must incorporate internal ratcheting logic. For instance, generated code for dedicated cross-field tags (like `+k8s:unionMember`) will be designed to only act upon changes to the specific fields they govern and were designed to validate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please build us an example as soon as we have enough merged so we can see how painful (or not) it seems to be.

@thockin thockin self-assigned this Jun 5, 2025
@deads2k
Copy link
Contributor

deads2k commented Jun 9, 2025

We had a lengthy discussion Friday evening and that we will reject any new PRs that introduce declarative validation rules for list-type=map that are dependent on the order of elements. We cannot think of an automated test that can enforce this, so we will need to vigilant.

Once we get the KEP updated to indicate this, I'm happy with the changes.

@deads2k
Copy link
Contributor

deads2k commented Jun 9, 2025

Thanks for the updates. lgtm

/approve
/assign @thockin

I'll leave the lgtm with Tim so he also has a chance to see the updated verbiage.

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 9, 2025
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

/lgtm
/approve
/unhold

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 9, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, thockin, yongruilin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants