Mle-28303 Dynamic Host Scaling Feature Implementation #168
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds dynamic MarkLogic host pool support to the operator, including API fields, CRD/status schema, controller reconciliation, management API client logic, dynamic pod startup behavior, and a functional spec.
Changes:
- Adds dynamic group API fields/status and generated CRD/deepcopy updates.
- Adds dynamic host lifecycle reconciliation for group configuration, token join, scale-down cleanup, restart recovery, and finalizers.
- Updates StatefulSet/Service generation, startup scripts, controller watches, and tests/spec docs for dynamic groups.
Reviewed changes
Copilot reviewed 19 out of 20 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/mlmanage/client.go | Adds MarkLogic Management API client for dynamic host operations. |
| pkg/k8sutil/statefulset.go | Adds dynamic labels, env var, readiness probe, and scale-down delay behavior. |
| pkg/k8sutil/service.go | Selects dynamic pods via component labels. |
| pkg/k8sutil/secret.go | Creates shared manage-admin credentials for dynamic groups. |
| pkg/k8sutil/scripts/cluster-init-wrapper.sh | Skips static init/join in dynamic mode. |
| pkg/k8sutil/scripts/cluster-config.sh | Adds dynamic-mode guard. |
| pkg/k8sutil/marklogicServer.go | Propagates dynamic config/defaults to child MarklogicGroup resources. |
| pkg/k8sutil/handler.go | Invokes dynamic reconciliation for dynamic groups. |
| pkg/k8sutil/dynamic_reconcile.go | Implements dynamic host lifecycle reconciliation. |
| pkg/k8sutil/common.go | Adds dynamic/static component label helpers. |
| internal/controller/marklogicgroup_controller.go | Adds pod update handling and pod ownership watch. |
| internal/controller/marklogiccluster_controller_test.go | Adds dynamic group propagation tests. |
| docs/spec/Dynamic Host.md | Adds functional specification for dynamic host support. |
| config/crd/bases/marklogic.progress.com_marklogicgroups.yaml | Adds dynamic spec/status schema for MarklogicGroup. |
| config/crd/bases/marklogic.progress.com_marklogicclusters.yaml | Adds dynamic fields/validation for MarklogicCluster groups. |
| api/v1/zz_generated.deepcopy.go | Adds deepcopy support for dynamic structs/fields. |
| api/v1/marklogicgroup_types.go | Adds dynamic fields and status structs to MarklogicGroup API. |
| api/v1/marklogiccluster_types.go | Adds dynamic fields to cluster group entries. |
| api/v1/common_types.go | Adds dynamic group configuration type. |
Files not reviewed (1)
- api/v1/zz_generated.deepcopy.go: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Collaborator
|
@pengzhouml please respond to Copilot's comments and mark conversations as Resolved. |
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 22 changed files in this pull request and generated 5 comments.
Files not reviewed (1)
- api/v1/zz_generated.deepcopy.go: Language not supported
Comments suppressed due to low confidence (1)
pkg/k8sutil/statefulset.go:115
patchDiff.String()is logged before checkingerrfrompatch.DefaultPatchMaker.Calculate. If Calculate fails,patchDiffmay be unusable and calling methods on it can panic or log misleading output. Please check/return onerrbefore usingpatchDiff(including after the second Calculate call in the dynamic scale-down delay branch).
patchDiff, err := patch.DefaultPatchMaker.Calculate(currentSts, statefulSetDef,
patch.IgnoreStatusFields(),
patch.IgnoreVolumeClaimTemplateTypeMetaAndStatus(),
patch.IgnoreField("kind"))
if shouldDelayDynamicEmptyDirScaleDown(cr, currentSts) {
statefulSetDef.Spec.Replicas = currentSts.Spec.Replicas
patchDiff, err = patch.DefaultPatchMaker.Calculate(currentSts, statefulSetDef,
patch.IgnoreStatusFields(),
patch.IgnoreVolumeClaimTemplateTypeMetaAndStatus(),
patch.IgnoreField("kind"))
}
logger.Info("Patch Diff:", "Diff", patchDiff.String())
logger.Info("statefulSetDef Spec:", "Spec", statefulSetDef.Spec.Replicas)
if err != nil {
logger.Error(err, "Error calculating patch")
return result.Error(err).Output()
Introduce the initial Dynamic Host scaffolding without implementing Management API join/remove workflows. - add isDynamic and dynamic config to cluster/group APIs - add DynamicGroup status types and deepcopy support - propagate dynamic fields from MarklogicCluster to MarklogicGroup - default dynamic groups to RollingUpdate and non-persistent datadir - use dynamic-host selector/labels for dynamic StatefulSets and Services - inject MARKLOGIC_DYNAMIC_HOST into dynamic pods - switch dynamic readiness to TCP on port 8001 - guard shell startup so dynamic pods skip static bootstrap/join logic - add focused controller tests for milestone 1 behavior
- remove invalid defaulting of dynamic config on static groups - keep omitted dynamic persistence unset instead of creating invalid spec - skip bootstrap network gating for dynamic pod startup - regenerate CRDs after validation tag updates - tighten controller tests for readiness defaults and unique group names - verify controller suite passes with envtest
Introduce controller-side dynamic host bootstrap and configuration without implementing token join/remove flows. - add Management API client plumbing for dynamic host operations - create operator-managed manage-admin credentials for dynamic reconcile - bootstrap and reconcile the MarkLogic manage-admin user - add dynamic reconcile branch for bootstrap readiness and version checks - ensure dynamic group creation and one-time dynamic host configuration - record dynamic configuration state in MarklogicGroup status - extend envtest coverage with fake-client based controller tests
Implement controller-driven dynamic host scale-up joins without introducing remove or restart-recovery behavior yet. - add token request and join flow to dynamic reconcile - join locally ready dynamic pods sequentially - verify MarkLogic membership before marking hosts joined - record per-host join state and retry progress in dynamic status - preserve retry-budget accounting across transient join failures - tighten fake management client behavior for transient token retries - extend envtest coverage for successful, degraded, and exhausted-retry joins
Implement storage-aware dynamic host cleanup for scale-down, scale-to-zero, and group deletion. - add dynamic-host remove support to the management client - add pod and group finalizers for dynamic cleanup - remove EmptyDir-backed hosts before allowing pod deletion - retain PVC-backed hosts during ordinary scale-down - clean up dynamic groups on deletion and scale-to-zero - preserve cleanup and failure state across reconciles - harden fake client host tracking for multi-group controller tests - extend envtest coverage for scale-down and cleanup behavior
…c hosts detect restart membership loss when pods are locally ready but absent from MarkLogic group membership add restart-recovery flow with explicit host states (rejoin-pending, rejoining, rejoined) and ClusterRestartDetected reason support PVC-backed restart recovery by cleaning retained state before rejoin add retry-budget handling for restart cleanup and restart rejoin failures preserve rejoined host state in dynamic status reconstruction add controller envtests for EmptyDir rejoin, restart status visibility, PVC cleanup-before-rejoin, partial recovery, and bootstrap-unavailable recovery paths stabilize restart-focused tests with deterministic reconcile triggering and robust token-call assertions
make restart-recovery specs deterministic by explicitly triggering reconciles after fake backend mutations replace brittle transient status checks with durable recovery assertions tighten token-call verification to host-specific counts to avoid cross-spec noise relax final state expectations to accept stable joined or rejoined outcomes where timing can vary
…observability replace steady-state phase configured with idle across reconcile flow/tests add dynamic status timestamps: lastTransitionTime and host lastUpdated add PodStartupTimeout detection for pods that never become locally ready set degraded reason to PodStartupTimeout when startup timeout is hit emit dynamic lifecycle transition events (normal/warning) add/update DynamicHostsReady condition lifecycle update CRD schema and deepcopy generation for new status fields expand envtests for timeout, condition state, and event assertions
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
1 commit message fix(dynamic): align status.dynamic.phase enum casing with API contract Use Pending, Reconciling, Deleting, Degraded, Failed, Idle for dynamic phases Update controller tests to assert the new phase values Verified with focused k8sutil and controller TestAPIs test runs
added 2 commits
May 28, 2026 23:53
rwinieski
reviewed
May 29, 2026
rwinieski
previously approved these changes
Jun 1, 2026
rwinieski
approved these changes
Jun 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces significant improvements to the E2E testing pipeline and Makefile, with a focus on supporting new E2E test scopes, better handling of local images for Minikube, and stricter validation for dynamic group configurations. It also updates the default version to 1.3.0 and enhances developer experience with more flexible and robust Makefile targets.
E2E Pipeline and Testing Enhancements:
Added support for selecting E2E test scopes (
cluster,dynamic-host,volume-resize) via theE2E_SCOPEparameter in the Jenkins pipeline, including validation and error handling for unsupported combinations (e.g., restricting non-clusterscopes when running on EKS). Istio and Helm namespace-scoped tests are now only run for theclusterscope. (Jenkinsfile) [1] [2] [3] [4] [5] [6] [7]Introduced new Makefile targets for focused E2E tests:
e2e-test-dynamic-hostande2e-test-dynamic-host-localfor dynamic-host lifecycle tests, with logic to build/load local images for Minikube contexts.e2e-test-volume-resize-localfor volume-resize tests with local image and Minikube storage class setup.Makefile)Increased E2E test timeouts to 60 minutes for most targets and made timeouts configurable via
E2E_TEST_TIMEOUT. (Makefile) [1] [2] [3] [4] [5]Image and Build Improvements:
1.3.0and added aLOCAL_E2E_IMGvariable for local E2E testing. (Makefile) [1] [2]--loadfor compatibility with local image loading. (Makefile)Kustomize and Tooling:
kustomizeMakefile target to respect externally set paths and provide better error messaging if the binary is not executable. (Makefile)API Validation and CRD Changes:
DynamicGroupConfigstruct with strict ISO-8601 duration validation for thetokenDurationfield. (api/v1/common_types.go)latestor MarkLogic major version 12+. Also, added a max length validation for theimagefield. (api/v1/marklogiccluster_types.go) [1] [2]Developer Experience:
kill-envtestMakefile target to clean up stale test processes. (Makefile)These changes collectively improve the reliability, flexibility, and developer usability of the E2E testing and build workflow, while enforcing important validation rules for dynamic group configurations and image selection.