Skip to content

K8SPSMDB-1219: Fix upgrade to v1.20.0 if multiple storages are defined #1910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 12, 2025

Conversation

egegunes
Copy link
Contributor

@egegunes egegunes commented May 6, 2025

K8SPSMDB-1219 Powered by Pull Request Badge

CHANGE DESCRIPTION

Problem:
Short explanation of the problem.

Cause:
Short explanation of the root cause of the issue if applicable.

Solution:
Short explanation of the solution we are providing with this PR.

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?
  • Are OpenShift compare files changed for E2E tests (compare/*-oc.yml)?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported MongoDB version?
  • Does the change support oldest and newest supported Kubernetes version?

@pull-request-size pull-request-size bot added the size/M 30-99 lines label May 6, 2025
@egegunes egegunes force-pushed the K8SPSMDB-1219 branch 2 times, most recently from 5e48aa0 to 8141fad Compare May 7, 2025 15:34
@pull-request-size pull-request-size bot added size/L 100-499 lines and removed size/M 30-99 lines labels May 7, 2025
@egegunes egegunes force-pushed the K8SPSMDB-1219 branch 2 times, most recently from bb572ee to 35bffc6 Compare May 8, 2025 12:09
if cr.CompareVersion("1.20.0") < 0 && cr.Spec.Backup.PITR.Enabled {
if len(cr.Spec.Backup.Storages) != 1 {
cr.Spec.Backup.PITR.Enabled = false
log.Info("Point-in-time recovery can be enabled only if one bucket is used in spec.backup.storages")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe here we want to say storage instead of bucket. Maybe we can use the following version of the log with a little different meaning:

log.Info("disabling point-in-time recovery: requires exactly one storage in spec.backup.storages")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very old log and it'll be removed soon, i am not sure if we should change it

@@ -236,6 +292,15 @@ func (r *ReconcilePerconaServerMongoDB) reconcilePiTRConfig(ctx context.Context,
defer pbm.Close(ctx)

if err := enablePiTRIfNeeded(ctx, pbm, cr); err != nil {
if backup.IsErrNoDocuments(err) {
if cr.CompareVersion("1.20.0") < 0 && len(cr.Spec.Backup.Storages) == 1 {
Copy link
Contributor

@gkech gkech May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking if the storages are equal to 1 here, I would rather push this responsibility inside the reconcile function and do something like this

	if len(cr.Spec.Backup.Storages) != 1 {
		log.Info("Expected exactly one storage for PiTR in legacy version", "configured", len(storages))
		return nil
	}

This makes the logic inside the reconcile function that selects the 1st storage it finds much easier to follow and why it happens.

secretName = storage.Azure.CredentialsSecret
}

if secretName != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can push this validation inside the secretExists function, both for less nesting + for having consistency in case that function is reused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in this case secretExists function needs to return true (exists) if secret name is empty

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general secretExists should not be called at all if the storage type is S3 or Azure. Providing to the secretExists an empty name secret theoretically is a very good reason for this function to have a validation and throw an error. I understand that this may require some more changes, so we can skip it for now and revisit this when we touch it again.

}
}

err := pbm.GetNSetConfigLegacy(ctx, r.client, cr, storage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still wondering why this function is called GetNSet... since it is essentially setting. If internally it gets, sets, appends, etc is not something we should leak.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah the reason is it generates the PBM config according to values in cr.yaml and sets it

}

} else {
if len(cr.Spec.Backup.Storages) == 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check really needed? Looping the storages will return immediately if len=0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed

Comment on lines 212 to 274
stgName, _, err := cr.Spec.Backup.MainStorage()
if err != nil {
// no storage found
return nil
var stgName string
if cr.CompareVersion("1.20.0") >= 0 {
stgName, _, err = cr.Spec.Backup.MainStorage()
if err != nil {
// no storage found
return nil
}

} else {
if len(cr.Spec.Backup.Storages) == 1 {
for name := range cr.Spec.Backup.Storages {
stgName = name
break
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this can be written to something simpler:

	var stgName string
	for name := range cr.Spec.Backup.Storages {
		stgName = name
		break
	}
	if cr.CompareVersion("1.20.0") >= 0 {
		stgName, _, err = cr.Spec.Backup.MainStorage()
		if err != nil {
			// no storage found
			return nil
		}
	}

I think it is the same behaviour, right?

gkech
gkech previously approved these changes May 8, 2025
nmarukovich
nmarukovich previously approved these changes May 8, 2025
@egegunes egegunes dismissed stale reviews from nmarukovich and gkech via 31f14a3 May 9, 2025 09:56
@JNKPercona
Copy link
Collaborator

Test name Status
arbiter passed
balancer passed
cross-site-sharded passed
custom-replset-name passed
custom-tls passed
custom-users-roles passed
custom-users-roles-sharded passed
data-at-rest-encryption passed
data-sharded passed
demand-backup passed
demand-backup-eks-credentials-irsa passed
demand-backup-fs passed
demand-backup-incremental passed
demand-backup-incremental-sharded passed
demand-backup-physical passed
demand-backup-physical-sharded passed
demand-backup-sharded passed
expose-sharded passed
finalizer passed
ignore-labels-annotations passed
init-deploy passed
ldap passed
ldap-tls passed
limits passed
liveness passed
mongod-major-upgrade passed
mongod-major-upgrade-sharded passed
monitoring-2-0 passed
multi-cluster-service passed
multi-storage passed
non-voting passed
one-pod passed
operator-self-healing-chaos passed
pitr passed
pitr-physical passed
pitr-sharded passed
pitr-physical-backup-source passed
preinit-updates passed
pvc-resize passed
recover-no-primary passed
replset-overrides passed
rs-shard-migration passed
scaling passed
scheduled-backup passed
security-context passed
self-healing-chaos passed
service-per-pod passed
serviceless-external-nodes passed
smart-update passed
split-horizon passed
stable-resource-version passed
storage passed
tls-issue-cert-manager passed
upgrade passed
upgrade-consistency passed
upgrade-consistency-sharded-tls passed
upgrade-sharded passed
users passed
version-service passed
We run 59 out of 59

commit: 31f14a3
image: perconalab/percona-server-mongodb-operator:PR-1910-31f14a37

@egegunes egegunes requested review from gkech and nmarukovich May 12, 2025 08:07
@egegunes egegunes merged commit 7cf59a6 into main May 12, 2025
19 checks passed
@egegunes egegunes deleted the K8SPSMDB-1219 branch May 12, 2025 08:40
eleo007 pushed a commit that referenced this pull request May 12, 2025
#1910)

* K8SPSMDB-1219: Fix upgrade to v1.20.0 if multiple storages are defined

* fix backups
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L 100-499 lines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants