-
Notifications
You must be signed in to change notification settings - Fork 221
Minimal MachinePool support #1506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for kubernetes-sigs-cluster-api-gcp ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: justinsb The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
This PR is WIP while I whittle down the unneeded code from cluster-api-provider-aws and generally make this reviewable. But I am uploading as this is a checkpoint that works (in a limited way!) |
428790f to
5906e99
Compare
|
Removing the WIP. I will still try to whittle down the code by extracting helpers etc, but it's already approaching the reviewable ballpark! |
|
So the linter is blowing up on the TODO comments. How do we want to track next steps in code? If we don't want to do |
6449078 to
2b81034
Compare
| resources: | ||
| - gcpmachinepools | ||
| verbs: | ||
| - delete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not folder into the same block as e.g. gcpmachines because we don't need create. I'm not sure that we need create on e.g. gcpclusters either, but ... that's a separate issue
| } | ||
|
|
||
| // FUTURE: do we need to verify that the instances are actually running? | ||
| machinePoolScope.GCPMachinePool.Spec.ProviderIDList = providerIDList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like this very much (it shouldn't be spec, it requires us to poll the cloud API), but it seems to be the MachinePool contract.
|
So I think I finally have this working with an e2e test in #1539 and (hopefully) in a mergeable state. The big thing for the e2e test was populating spec.providerIDList. I don't love that contract, but it is the MachinePool contract. |
pkg/logger/logger.go
Outdated
| } | ||
|
|
||
| // Logger is a concrete logger using logr underneath. | ||
| type Logger struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious why we are adding a new logger here? And is it going to replace the loggers in other controllers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I added this because CAPA does it, but I also think it's a good call @bochengchu - it feels orthogonal. I think I will remove it.
|
LGTM |
|
I removed the logging that I agree is orthogonal - thanks @bochengchu |
|
/retest |
|
So I think this is looking reasonable (IMO) - the failing test is apidiff and we are adding to the API, so that is expected! |
|
I think once we get #1542 and kubernetes/test-infra#35686 in we should see sensible results from apidiff, but ... it will still fail because we are changing the API (at least AIUI) |
In order for nodes to be associated to the MachinePool, we need to populate the spec.providerIDList field. This field is known to the MachinePool controller.
62f145b to
35204fe
Compare
|
@damdo: GitHub didn't allow me to assign the following users: barbacbd. Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Thanks to @salasberryfin for merging the other two PRs, apidiff is now completing. It's actually passing, so it is not testing our public API, but rather our internal API, which is ... not what I assumed. But in any case, please take a look - would be great to get this in! |
|
@justinsb are you expecting for this to go in first and the E2Es for @bochengchu to go in after or? What's the best strategy here :) LMK |
|
Hi! So there are two workstreams: MachinePool and HA Internal Load Balancer. I'm doing MachinePool and @bochengchu is doing HA Internal Load Balancer. (Currently) MachinePool is split into two PRs:
It seemed like a good idea at the time to split out the tests to keep the code smaller, though actually as a reviewer I'm not sure it would have made my life easier, so ... sorry :-). LMK if you want me to combine them, but #1539 is passing on top of this PR, so if we could approve this one and I will rebase the tests, it would be great to get this in! I can look now at @bochengchu 's PRs. I still have approval rights in this repo as a cluster-lifecycle lead, so I can approve them if nobody objects. I was waiting for an e2e test before doing so. I think the implementation is in #1533 and the tests are in #1550. In that case I think we are right to split them because I will ok-to-test #1550 now, we expect tests to fail until #1533 is in, and then I guess we can ask for the tests to be rebased on top of the implementation. Then we have two test runs, one with the fix, one without, and hopefully the one with the fix passes and the other fails :-) TLDR: For MachinePool, an lgtm/approve would be great here, I can then rebase the MachinePool tests. If you and others don't object, I can review and approve the Load Balancer fixes. |
|
Hey :-) did a first round of review on the PR One question: do we need some further validation (via webhook or something) for the spec fields? (if there's maybe prior art for GCPMachine or so? I did run KAL for checking the API and that got some findings you may want to fix: ../cluster-api/hack/tools/bin/golangci-lint-kube-api-linter run --new-from-rev 20859ca1 --config ../cluster-api/.golangci-kal.ymlSome of the findings are definetly there to ignore like the nobools on .status.ready (because its the contract requiring it like that). Output: exp/api/v1beta1/gcpmachinepool_types.go:35:2: commentstart: godoc for field ProviderIDList should start with 'providerIDList ...' (kubeapilinter)
// ProviderIDList are the identification IDs of machine instances provided by the provider.
^
exp/api/v1beta1/gcpmachinepool_types.go:38:2: maxlength: field ProviderIDList array element must have a maximum length, add kubebuilder:validation:items:MaxLength marker (kubeapilinter)
ProviderIDList []string `json:"providerIDList,omitempty"`
^
exp/api/v1beta1/gcpmachinepool_types.go:40:2: commentstart: godoc for field InstanceType should start with 'instanceType ...' (kubeapilinter)
// InstanceType is the type of instance to create. Example: n1.standard-2
^
exp/api/v1beta1/gcpmachinepool_types.go:41:2: maxlength: field InstanceType must have a maximum length, add kubebuilder:validation:MaxLength marker (kubeapilinter)
InstanceType string `json:"instanceType"`
^
exp/api/v1beta1/gcpmachinepool_types.go:43:2: commentstart: godoc for field Subnet should start with 'subnet ...' (kubeapilinter)
// Subnet is a reference to the subnetwork to use for this instance. If not specified,
^
exp/api/v1beta1/gcpmachinepool_types.go:46:2: maxlength: field Subnet must have a maximum length, add kubebuilder:validation:MaxLength marker (kubeapilinter)
Subnet *string `json:"subnet,omitempty"`
^
exp/api/v1beta1/gcpmachinepool_types.go:58:2: commentstart: godoc for field ImageFamily should start with 'imageFamily ...' (kubeapilinter)
// ImageFamily is the full reference to a valid image family to be used for this machine.
^
exp/api/v1beta1/gcpmachinepool_types.go:60:2: maxlength: field ImageFamily must have a maximum length, add kubebuilder:validation:MaxLength marker (kubeapilinter)
ImageFamily *string `json:"imageFamily,omitempty"`
^
exp/api/v1beta1/gcpmachinepool_types.go:62:2: commentstart: godoc for field Image should start with 'image ...' (kubeapilinter)
// Image is the full reference to a valid image to be used for this machine.
^
exp/api/v1beta1/gcpmachinepool_types.go:65:2: maxlength: field Image must have a maximum length, add kubebuilder:validation:MaxLength marker (kubeapilinter)
Image *string `json:"image,omitempty"`
^
exp/api/v1beta1/gcpmachinepool_types.go:67:2: commentstart: godoc for field AdditionalLabels should start with 'additionalLabels ...' (kubeapilinter)
// AdditionalLabels is an optional set of tags to add to an instance, in addition to the ones added by default by the
^
exp/api/v1beta1/gcpmachinepool_types.go:73:2: commentstart: godoc for field AdditionalMetadata should start with 'additionalMetadata ...' (kubeapilinter)
// AdditionalMetadata is an optional set of metadata to add to an instance, in addition to the ones added by default by the
^
exp/api/v1beta1/gcpmachinepool_types.go:78:2: maxlength: field AdditionalMetadata must have a maximum items, add kubebuilder:validation:MaxItems marker (kubeapilinter)
AdditionalMetadata []capg.MetadataItem `json:"additionalMetadata,omitempty"`
^
exp/api/v1beta1/gcpmachinepool_types.go:84:2: commentstart: godoc for field PublicIP should start with 'publicIP ...' (kubeapilinter)
// PublicIP specifies whether the instance should get a public IP.
^
exp/api/v1beta1/gcpmachinepool_types.go:87:2: nobools: field PublicIP pointer should not use a bool. Use a string type with meaningful constant values as an enum. (kubeapilinter)
PublicIP *bool `json:"publicIP,omitempty"`
^
exp/api/v1beta1/gcpmachinepool_types.go:89:2: commentstart: godoc for field AdditionalNetworkTags should start with 'additionalNetworkTags ...' (kubeapilinter)
// AdditionalNetworkTags is a list of network tags that should be applied to the
^
exp/api/v1beta1/gcpmachinepool_types.go:93:2: maxlength: field AdditionalNetworkTags array element must have a maximum length, add kubebuilder:validation:items:MaxLength marker (kubeapilinter)
AdditionalNetworkTags []string `json:"additionalNetworkTags,omitempty"`
^
exp/api/v1beta1/gcpmachinepool_types.go:95:2: commentstart: godoc for field ResourceManagerTags should start with 'resourceManagerTags ...' (kubeapilinter)
// ResourceManagerTags is an optional set of tags to apply to GCP resources managed
^
exp/api/v1beta1/gcpmachinepool_types.go:101:2: commentstart: godoc for field RootDeviceSize should start with 'rootDeviceSize ...' (kubeapilinter)
// RootDeviceSize is the size of the root volume in GB.
^
exp/api/v1beta1/gcpmachinepool_types.go:104:2: optionalfields: field RootDeviceSize has a valid zero value (0), but the validation is not complete (e.g. minimum/maximum). The field should be a pointer to allow the zero value to be set. If the zero value is not a valid use case, complete the validation and remove the pointer. (kubeapilinter)
RootDeviceSize int64 `json:"rootDeviceSize,omitempty"`
^
exp/api/v1beta1/gcpmachinepool_types.go:106:2: commentstart: godoc for field RootDeviceType should start with 'rootDeviceType ...' (kubeapilinter)
// RootDeviceType is the type of the root volume.
^
exp/api/v1beta1/gcpmachinepool_types.go:116:2: commentstart: godoc for field AdditionalDisks should start with 'additionalDisks ...' (kubeapilinter)
// AdditionalDisks are optional non-boot attached disks.
^
exp/api/v1beta1/gcpmachinepool_types.go:118:2: maxlength: field AdditionalDisks must have a maximum items, add kubebuilder:validation:MaxItems marker (kubeapilinter)
AdditionalDisks []capg.AttachedDiskSpec `json:"additionalDisks,omitempty"`
^
exp/api/v1beta1/gcpmachinepool_types.go:120:2: commentstart: godoc for field ServiceAccount should start with 'serviceAccounts ...' (kubeapilinter)
// ServiceAccount specifies the service account email and which scopes to assign to the machine.
^
exp/api/v1beta1/gcpmachinepool_types.go:125:2: commentstart: godoc for field Preemptible should start with 'preemptible ...' (kubeapilinter)
// Preemptible defines if instance is preemptible
^
exp/api/v1beta1/gcpmachinepool_types.go:127:2: nobools: field Preemptible should not use a bool. Use a string type with meaningful constant values as an enum. (kubeapilinter)
Preemptible bool `json:"preemptible,omitempty"`
^
exp/api/v1beta1/gcpmachinepool_types.go:129:2: commentstart: godoc for field ProvisioningModel should start with 'provisioningModel ...' (kubeapilinter)
// ProvisioningModel defines if instance is spot.
^
exp/api/v1beta1/gcpmachinepool_types.go:136:2: commentstart: godoc for field IPForwarding should start with 'ipForwarding ...' (kubeapilinter)
// IPForwarding Allows this instance to send and receive packets with non-matching destination or source IPs.
^
exp/api/v1beta1/gcpmachinepool_types.go:141:2: forbiddenmarkers: field IPForwarding has forbidden marker "kubebuilder:default=Enabled" (kubeapilinter)
IPForwarding *capg.IPForwarding `json:"ipForwarding,omitempty"`
^
exp/api/v1beta1/gcpmachinepool_types.go:143:2: commentstart: godoc for field ShieldedInstanceConfig should start with 'shieldedInstanceConfig ...' (kubeapilinter)
// ShieldedInstanceConfig is the Shielded VM configuration for this machine
^
exp/api/v1beta1/gcpmachinepool_types.go:147:2: commentstart: godoc for field OnHostMaintenance should start with 'onHostMaintenance ...' (kubeapilinter)
// OnHostMaintenance determines the behavior when a maintenance event occurs that might cause the instance to reboot.
^
exp/api/v1beta1/gcpmachinepool_types.go:153:2: commentstart: godoc for field ConfidentialCompute should start with 'confidentialCompute ...' (kubeapilinter)
// ConfidentialCompute Defines whether the instance should have confidential compute enabled or not, and the confidential computing technology of choice.
^
exp/api/v1beta1/gcpmachinepool_types.go:165:2: commentstart: godoc for field RootDiskEncryptionKey should start with 'rootDiskEncryptionKey ...' (kubeapilinter)
// RootDiskEncryptionKey defines the KMS key to be used to encrypt the root disk.
^
exp/api/v1beta1/gcpmachinepool_types.go:169:2: commentstart: godoc for field GuestAccelerators should start with 'guestAccelerators ...' (kubeapilinter)
// GuestAccelerators is a list of the type and count of accelerator cards
^
exp/api/v1beta1/gcpmachinepool_types.go:172:2: maxlength: field GuestAccelerators must have a maximum items, add kubebuilder:validation:MaxItems marker (kubeapilinter)
GuestAccelerators []capg.Accelerator `json:"guestAccelerators,omitempty"`
^
exp/api/v1beta1/gcpmachinepool_types.go:177:2: commentstart: godoc for field Ready should start with 'ready ...' (kubeapilinter)
// Ready is true when the provider resource is ready.
^
exp/api/v1beta1/gcpmachinepool_types.go:179:2: nobools: field Ready should not use a bool. Use a string type with meaningful constant values as an enum. (kubeapilinter)
Ready bool `json:"ready"`
^
exp/api/v1beta1/gcpmachinepool_types.go:181:2: commentstart: godoc for field Replicas should start with 'replicas ...' (kubeapilinter)
// Replicas is the most recently observed number of replicas
^
exp/api/v1beta1/gcpmachinepool_types.go:183:2: optionalfields: field Replicas has a valid zero value (0), but the validation is not complete (e.g. minimum/maximum). The field should be a pointer to allow the zero value to be set. If the zero value is not a valid use case, complete the validation and remove the pointer. (kubeapilinter)
Replicas int32 `json:"replicas"`
^
exp/api/v1beta1/gcpmachinepool_types.go:185:2: commentstart: godoc for field Conditions should start with 'conditions ...' (kubeapilinter)
// Conditions defines current service state of the GCPMachinePool.
^
exp/api/v1beta1/gcpmachinepool_types.go:187:2: conditions: Conditions field in GCPMachinePoolStatus must be a slice of metav1.Condition (kubeapilinter)
Conditions clusterv1.Conditions `json:"conditions,omitempty"`
^
exp/api/v1beta1/gcpmachinepool_types.go:189:2: commentstart: godoc for field FailureReason should start with 'failureReason ...' (kubeapilinter)
// FailureReason will be set in the event that there is a terminal problem
^
exp/api/v1beta1/gcpmachinepool_types.go:206:2: maxlength: field FailureReason must have a maximum length, add kubebuilder:validation:MaxLength marker (kubeapilinter)
FailureReason *string `json:"failureReason,omitempty"`
^
exp/api/v1beta1/gcpmachinepool_types.go:208:2: commentstart: godoc for field FailureMessage should start with 'failureMessage ...' (kubeapilinter)
// FailureMessage will be set in the event that there is a terminal problem
^
exp/api/v1beta1/gcpmachinepool_types.go:225:2: maxlength: field FailureMessage must have a maximum length, add kubebuilder:validation:MaxLength marker (kubeapilinter)
FailureMessage *string `json:"failureMessage,omitempty"`
^
exp/api/v1beta1/gcpmachinepool_types.go:238:2: commentstart: field metav1.ObjectMeta is missing godoc comment (kubeapilinter)
metav1.ObjectMeta `json:"metadata,omitempty"`
^
exp/api/v1beta1/gcpmachinepool_types.go:240:2: commentstart: field Spec is missing godoc comment (kubeapilinter)
Spec GCPMachinePoolSpec `json:"spec,omitempty"`
^
exp/api/v1beta1/gcpmachinepool_types.go:241:2: commentstart: field Status is missing godoc comment (kubeapilinter)
Status GCPMachinePoolStatus `json:"status,omitempty"`
^
48 issues:
* kubeapilinter: 48
|
|
|
||
| if feature.Gates.Enabled(capifeature.MachinePool) { | ||
| setupLog.Info("Enabling MachinePool reconcilers") | ||
| gcpMachinePoolConcurrency := gcpMachineConcurrency // FUTURE: Use our own flag while feature-gated? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to use a seperate flag :-)
| // Not meaningful for MachinePool | ||
| // // ProviderID is the unique identifier as specified by the cloud provider. | ||
| // // +optional | ||
| // ProviderID *string `json:"providerID,omitempty"` | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ProviderIDList is the replacement here :-)
| // Not meaningful for MachinePool | |
| // // ProviderID is the unique identifier as specified by the cloud provider. | |
| // // +optional | |
| // ProviderID *string `json:"providerID,omitempty"` |
| // FailureReason will be set in the event that there is a terminal problem | ||
| // reconciling the MachinePool and will contain a succinct value suitable | ||
| // for machine interpretation. | ||
| // | ||
| // This field should not be set for transitive errors that a controller | ||
| // faces that are expected to be fixed automatically over | ||
| // time (like service outages), but instead indicate that something is | ||
| // fundamentally wrong with the MachinePool's spec or the configuration of | ||
| // the controller, and that manual intervention is required. Examples | ||
| // of terminal errors would be invalid combinations of settings in the | ||
| // spec, values that are unsupported by the controller, or the | ||
| // responsible controller itself being critically misconfigured. | ||
| // | ||
| // Any transient errors that occur during the reconciliation of MachinePools | ||
| // can be added as events to the MachinePool object and/or logged in the | ||
| // controller's output. | ||
| // +optional | ||
| FailureReason *string `json:"failureReason,omitempty"` | ||
|
|
||
| // FailureMessage will be set in the event that there is a terminal problem | ||
| // reconciling the MachinePool and will contain a more verbose string suitable | ||
| // for logging and human consumption. | ||
| // | ||
| // This field should not be set for transitive errors that a controller | ||
| // faces that are expected to be fixed automatically over | ||
| // time (like service outages), but instead indicate that something is | ||
| // fundamentally wrong with the MachinePool's spec or the configuration of | ||
| // the controller, and that manual intervention is required. Examples | ||
| // of terminal errors would be invalid combinations of settings in the | ||
| // spec, values that are unsupported by the controller, or the | ||
| // responsible controller itself being critically misconfigured. | ||
| // | ||
| // Any transient errors that occur during the reconciliation of MachinePools | ||
| // can be added as events to the MachinePool object and/or logged in the | ||
| // controller's output. | ||
| // +optional | ||
| FailureMessage *string `json:"failureMessage,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The corresponding fields in CAPI where this bubbles up to are deprecated, should we remove them here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Citing from the WIP contract.
The use of failureReason and failureMessage should not be used for new InfraMachinePool implementations. In other areas of the Cluster API, starting from the v1beta2 contract version, there is no more special treatment for provider’s terminal failures within Cluster API.
| // Conditions defines current service state of the GCPMachinePool. | ||
| // +optional | ||
| Conditions clusterv1.Conditions `json:"conditions,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we think about using metav1.Conditions? (considering clusterv1.Conditions gets removed we could already now start using metav1.Conditions and don't have to migrate later on)?
It's a question of consistency too, we maybe want to use clusterv1.Conditions, but I don't see a real reason why we can't use metav1.Conditions already now.
| // GetConditions returns the observations of the operational state of the GCPMachinePool resource. | ||
| func (r *GCPMachinePool) GetConditions() clusterv1.Conditions { | ||
| return r.Status.Conditions | ||
| } | ||
|
|
||
| // SetConditions sets the underlying service state of the GCPMachinePool to the predescribed clusterv1.Conditions. | ||
| func (r *GCPMachinePool) SetConditions(conditions clusterv1.Conditions) { | ||
| r.Status.Conditions = conditions | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do metav1, we would have to rewrite these and fulfill the new signature / names / interface
|
|
||
| actual.TargetSize = desired.TargetSize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| actual.TargetSize = desired.TargetSize |
Duplicate to above?
| return instanceTemplateKey, nil | ||
| } | ||
|
|
||
| // Delete delete machine instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it do that?
| // Delete delete machine instance. | |
| // Delete deletes all machine templates for a instance group. |
| "sigs.k8s.io/controller-runtime/pkg/log" | ||
| ) | ||
|
|
||
| // Reconcile reconcile machine instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also commentsh ere are wrong :-)
| "sigs.k8s.io/controller-runtime/pkg/log" | ||
| ) | ||
|
|
||
| // Reconcile reconcile machine instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also needs proper comments
|
|
||
| namePrefix := baseKey.Name | ||
| suffix := hashHex[:16] | ||
| name := namePrefix + suffix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, what if we hit a colission? It would use the old template right?
Should we have the full hash on a tag or something so we could compare? Or other ways to detect and react?
Initial spike: GCPMachinePool
GCPMachinePool: generated code/manifests
This continues the work started by @BrennenMM7 in #901 . I also combined in the support from cluster-api-provider-aws to see what we want to borrow from that, and will whittle the code we don't need from cluster-api-provider-aws away.