Skip to content

📖 Add CRD Upgrade Safety documentation #1090

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/Tasks/installing-an-extension.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ After you add a catalog to your cluster, you can install an extension by creatin
!!! warning
Currently, the following limitations affect the installation of extensions:

* If mulitple catalogs are added to a cluster, you cannot specify a catalog when you install an extension.
* If multiple catalogs are added to a cluster, you cannot specify a catalog when you install an extension.
* OLM 1.0 requires that all of the extensions have unique bundle and package names for dependency resolution.

As a result, if two catalogs have an extension with the same name, the installation might fail or lead to an unintended outcome.
Expand Down
240 changes: 240 additions & 0 deletions docs/refs/crd-upgrade-safety.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,240 @@
# Custom Resource Definition Upgrade Safety

When you update a Custom Resource Definition (CRD), OLM runs a CRD Upgrade Safety preflight
check to ensure backwards compatibility with previous versions of that CRD. The CRD update
must pass the validation checks before the change is allowed to progress on a cluster.

## Prohibited CRD Upgrade Changes

The following changes to an existing CRD will be caught by the CRD Upgrade
Safety preflight check and prevent the upgrade:

- The scope changes from Cluster to Namespace or from Namespace to Cluster
- An existing stored version of the CRD is removed
- A new required field is added to an existing version of the CRD
- An existing field is removed from an existing version of the CRD
- An existing field type is changed in an existing version of the CRD
- A new default value is added to a field that did not previously have a default value
- The default value of a field is changed
- An existing default value of a field is removed
- New enum restrictions are added to an existing field which did not previously have enum restrictions
- Existing enum values from an existing field are removed
- The minimum value of an existing field is increased in an existing version
- The maximum value of an existing field is decreased in an existing version
- Minimum or maximum field constraints are added to a field that did not previously have constraints
Comment on lines +12 to +24
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if in a future enhancement, it would be worth refactoring this content into a table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be nice. I spent a few minutes trying to think of how exactly that would be structured and couldn't really come up with a good idea. Maybe columns for safe/unsafe changes or something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have played around with it too, but I haven't been able to beat your list. I also haven't been able to figure out a good information architecture.

I do feel like like it would be nice to have this be a bit more scan-able as a reference. Maybe we will come up with something when we downstream this doc?


!!! note
The rules for changes to minimum and maximum values apply to `minimum`, `minLength`,
`minProperties`, `minItems`, `maximum`, `maxLength`, `maxProperties`, and `maxItems` constraints.

If the CRD Upgrade Safety preflight check encounters one of the disallowed upgrade changes,
it will log an error for each disallowed change detected in the CRD upgrade.

!!! tip
In cases where a change to the CRD does not fall into one of the disallowed change categories
but is also unable to be properly detected as allowed, the CRD Upgrade Safety preflight check
will prevent the upgrade and log an error for an "unknown change."

If you identify any preflight checks that should be implemented to prevent issues during CRD upgrades, please [create a new issue](https://github.com/operator-framework/operator-controller/issues).


## Allowed CRD Upgrade Changes

The following changes to an existing CRD are safe for backwards compatibility and will
not cause the CRD Upgrade Safety preflight check to halt the upgrade:

- Adding new enum values to the list of allowed enum values in a field
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to leave in for now because it is not considered a breaking change in our validations, but something we identified is that Kubernetes API conventions state that adding new enum values is actually considered a breaking change.

- An existing required field is changed to optional in an existing version
- The minimum value of an existing field is decreased in an existing version
- The maximum value of an existing field is increased in an existing version
- A new version of the CRD is added with no modifications to existing versions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No change needed in this PR, but something that I think we may have overlooked when originally developing the preflight checks is that new versions of the CRD must be backwards compatible with the served versions OR provide conversion logic to prevent data loss and invalidation of resources stored at an older version.

@joelanford I think we may need to revisit the CRD Upgrade Safety preflight checks to ensure we capture behavior to check for breaking changes between all stored and served versions of the CRD when there is no conversion strategy specified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given the long discussion this birthed in Slack, do we want to hold off on these docs until that work is done or would we just update this once that's complete? Not sure on what kind of timeline you were imagining all this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can let the doc through and update it as necessary



## Disabling CRD Upgrade Safety

The CRD Upgrade Safety preflight check can be entirely disabled by adding the
`preflight.crdUpgradeSafety.disabled` field with a value of "true" to the ClusterExtension of the CRD.

```yaml
apiVersion: olm.operatorframework.io/v1alpha1
kind: ClusterExtension
metadata:
name: clusterextension-sample
spec:
installNamespace: default
packageName: argocd-operator
version: 0.6.0
preflight:
crdUpgradeSafety:
disabled: true
```

You cannot disable individual field validators. If you disable the CRD Upgrade Safety preflight check, all field validators are disabled.

!!! warning
Disabling the CRD Upgrade Safety preflight check could break backwards compatibility with stored
versions of the CRD and cause other unintended consequences on the cluster.


## Examples of Unsafe CRD Changes

Take the following CRD as our starting version:

```yaml
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.13.0
name: example.test.example.com
spec:
group: test.example.com
names:
kind: Sample
listKind: SampleList
plural: samples
singular: sample
scope: Namespaced
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
properties:
apiVersion:
type: string
kind:
type: string
metadata:
type: object
spec:
type: object
status:
type: object
pollInterval:
type: string
type: object
served: true
storage: true
subresources:
status: {}
```

The following examples will demonstrate specific changes to sections of the example CRD
that would be caught by the CRD Upgrade Safety preflight check.

### Changing Scope

In this example, `scope` has been changed from `Namespaced` to `Cluster`.

??? example
```yaml
spec:
group: test.example.com
names:
kind: Sample
listKind: SampleList
plural: samples
singular: sample
scope: Cluster
versions:
- name: v1alpha1
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include what the error message looks like when a change like this is caught?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

??? failure "Error output"
```
validating upgrade for CRD "test.example.com" failed: CustomResourceDefinition test.example.com failed upgrade safety validation. "NoScopeChange" validation failed: scope changed from "Namespaced" to "Cluster"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the long line is a little hard to read, but I have always been told to keep example outputs as they are printed in the terminal.

```

### Removing a stored version

In this example, the existing stored version, `v1alpha1`, has been removed:

??? example
```yaml
versions:
- name: v1alpha2
schema:
openAPIV3Schema:
properties:
apiVersion:
type: string
kind:
type: string
metadata:
type: object
spec:
type: object
status:
type: object
pollInterval:
type: string
type: object
```

??? failure "Error output"
```
validating upgrade for CRD "test.example.com" failed: CustomResourceDefinition test.example.com failed upgrade safety validation. "NoStoredVersionRemoved" validation failed: stored version "v1alpha1" removed
```

### Removing an existing field

In this example, the `pollInterval` field has been removed from `v1alpha1`:

??? example
```yaml
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
properties:
apiVersion:
type: string
kind:
type: string
metadata:
type: object
spec:
type: object
status:
type: object
type: object
```

??? failure "Error output"
```
validating upgrade for CRD "test.example.com" failed: CustomResourceDefinition test.example.com failed upgrade safety validation. "NoExistingFieldRemoved" validation failed: crd/test.example.com version/v1alpha1 field/^.spec.pollInterval may not be removed
```

### Adding a required field

In this example, `pollInterval` has been changed to a required field:

??? example
```yaml
versions:
- name: v1alpha2
schema:
openAPIV3Schema:
properties:
apiVersion:
type: string
kind:
type: string
metadata:
type: object
spec:
type: object
status:
type: object
pollInterval:
type: string
type: object
required:
- pollInterval
```

??? failure "Error output"
```
validating upgrade for CRD "test.example.com" failed: CustomResourceDefinition test.example.com failed upgrade safety validation. "ChangeValidator" validation failed: version "v1alpha1", field "^": new required fields added: [pollInterval]
```
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ nav:
- Deleting an extension: 'Tasks/uninstall-an-extension.md'
- References:
- Catalog queries: 'refs/catalog-queries.md'
- CRD Upgrade Safety: 'refs/crd-upgrade-safety.md'

markdown_extensions:
- pymdownx.highlight:
Expand Down