MLE-28304 Volume Resizing Implementation#162
Merged
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Implements end-to-end PersistentVolumeClaim (PVC) expansion support for MarkLogic groups, including validation, multi-phase workflow/status tracking, StatefulSet template synchronization via delete/recreate, controlled pod restarts when filesystem expansion is offline, and accompanying RBAC/CRD/API updates.
Changes:
- Adds a new multi-phase
ReconcileVolumeResizeValidation()workflow with status/events, retry handling, sequential/parallel strategies, and StatefulSet/pod orchestration. - Extends the API/CRDs with
spec.persistence.resizeStrategyandstatus.volumeResizeStatus(plus deepcopy updates) and adds unit/controller tests. - Updates controller RBAC/manifests/Helm chart to allow required PVC/StorageClass/PV/Event access, and adds functional spec documentation.
Reviewed changes
Copilot reviewed 15 out of 16 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/k8sutil/volume_resize_validation.go | Core resize reconciliation state machine (validation → PVC patching → waiting → STS sync → pod restarts → verification). |
| pkg/k8sutil/volume_resize_validation_test.go | Unit tests covering validation, strategies, retry behavior, sync markers, restarts, and verification transitions. |
| pkg/k8sutil/handler.go | Inserts resize reconciliation into the MarklogicGroup handler flow (before StatefulSet reconcile). |
| internal/controller/marklogicgroup_controller.go | Adds RBAC annotations for PVC/PV/StorageClass/events needed by resizing. |
| internal/controller/marklogicgroup_controller_test.go | Adds envtest-style controller tests for resize validation behaviors. |
| docs/spec/volume resize.md | Functional spec for the resize feature (workflow, status contract, recovery, RBAC). |
| docs/operator-scope-configuration.md | Documents extra StorageClass ClusterRole needed in namespace-scoped mode. |
| config/rbac/role.yaml | Adds PVC/PV/StorageClass/events permissions to cluster-scoped role. |
| config/rbac/role_namespaced.yaml | Adds PVC/events Role permissions and a StorageClass reader ClusterRole/Binding for namespace-scoped mode. |
| config/crd/bases/marklogic.progress.com_marklogicgroups.yaml | Adds persistence.resizeStrategy and status.volumeResizeStatus schema. |
| config/crd/bases/marklogic.progress.com_marklogicclusters.yaml | Adds persistence.resizeStrategy schema at cluster and group override levels. |
| charts/marklogic-operator-kubernetes/templates/manager-rbac.yaml | Helm RBAC updates for PVC/PV/StorageClass/events + StorageClass reader ClusterRole/Binding in namespace scope. |
| api/v1/common_types.go | Introduces VolumeResizeStrategy and adds Persistence.ResizeStrategy with default/enum validation. |
| api/v1/marklogicgroup_types.go | Adds resize phase/reason/state enums and VolumeResizeStatus/PVC status types onto MarklogicGroupStatus. |
| api/v1/zz_generated.deepcopy.go | Deepcopy support for new status/types. |
| api/v1/marklogicgroup_types_test.go | Deepcopy regression test for the new status fields. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2b8a349 to
9125c5b
Compare
rwinieski
reviewed
May 18, 2026
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
… add RBAC/test coverage
add controller RBAC markers for PVC status, PV, and events add matching cluster and namespaced RBAC manifest rules align Helm manager RBAC template with generated RBAC
…wait loop, and retry/backoff status handling Co-authored-by: Copilot <copilot@github.com>
New phases: SynchronizingStatefulSet, RestartingPods, WaitingForPodsReady Recovery markers and resume behavior OfflinePending-only restart candidate logic Reverse ordinal restart ordering Tests passed: go test ./pkg/k8sutil -run TestResize -count=1 go test ./api/v1 -count=1 go test ./internal/controller -run TestDoesNotExist -count=1 Co-authored-by: Copilot <copilot@github.com>
…safety add VerifyingResizeOutcome execution path and active-phase routing implement final verification checks for PVCs, StatefulSet template, restart state, filesystem pending, and pod readiness transition successful verification to Completed with coherent terminal status fields add verification retry and failure handling with stalled resume back to verification and max-retry failure add coarse verification lifecycle markers and events fix CAS claim behavior to reliably start deferred target after terminal persistence add PR5 tests for completion, retry resume, terminal failure, deferred handoff, and final field consistency Co-authored-by: Copilot <copilot@github.com>
…cess blockers Co-authored-by: Copilot <copilot@github.com>
…dation add MarklogicGroup env component tests for growth initialization, shrink rejection, and non-OnDelete strategy rejection add reusable helpers to create persistent MarklogicGroup and PVC fixtures align PVC test fixture with current Kubernetes API using VolumeResourceRequirements
- switch resize retries to bounded exponential backoff (10s initial, 5m cap, 15 max) - requeue missing/unbound PVC validation stalls so resize can self-recover automatically - fix sequential strategy handoff by transitioning back to ResizingPVCs for next patch - classify template-below-target verification as StatefulSetSyncFailed (not MarkLogicHealthCheckFailed) - route stalled retries to the correct phase based on failure domain - move internal crash-recovery markers out of warnings into a dedicated markers field - add legacy marker normalization, CRD/deepcopy updates, and expanded unit test coverage - update sample config to enable persistence in quick-start
* add-test-suites * add csi-hostpath-driver addon * fix Copilot comments * improve E2E summary output * fix SC config * re arrange test sequence * remove duplicate test
9125c5b to
2cbd1b4
Compare
…ehavior preserve current resize phase while paused and mark status with reason=Paused clear pause reason/message on resume so normal phase progression can continue prevent new resize operation creation after terminal status when spec generation/target is unchanged add bounded jitter to exponential retry backoff (still capped at max delay) expand tests for pause/resume, terminal restart fencing, and retry jitter bounds
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Agent-Logs-Url: https://github.com/marklogic/marklogic-operator-kubernetes/sessions/e4735662-4677-428a-adcd-e92b434f62dc Co-authored-by: pengzhouml <27710236+pengzhouml@users.noreply.github.com>
Collaborator
|
@pengzhouml please make sure to use Jira id in the commit messages as standard practice |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Agent-Logs-Url: https://github.com/marklogic/marklogic-operator-kubernetes/sessions/20ad2d85-dd1c-443d-a0aa-d308bfd64a69 Co-authored-by: pengzhouml <27710236+pengzhouml@users.noreply.github.com>
Agent-Logs-Url: https://github.com/marklogic/marklogic-operator-kubernetes/sessions/ef002b51-2f6d-4a90-bf30-e9102d83e895 Co-authored-by: pengzhouml <27710236+pengzhouml@users.noreply.github.com>
rwinieski
approved these changes
May 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.