-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check if PV/PVC Access Modes are Supported by the Volume Provider #139
Conversation
[sc-61144] |
pkg/migrate/migrate_test.go
Outdated
resources: []runtime.Object{ | ||
&storagev1.StorageClass{ | ||
ObjectMeta: metav1.ObjectMeta{ | ||
Name: "srcSc", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we try to add names here that clarifies WHEN.. THEN .. EXPECTED?
more info: https://dev.to/smyrman/test-with-expect-a-bdd-style-go-naming-pattern-5eh5
Example: when has x should not fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. I'll give the TWE naming convention a try.
pkg/migrate/migrate.go
Outdated
|
||
// checkVolumeAccessModeValid checks if the access modes of a pv are supported by the | ||
// destination storage class. | ||
func (p *PVMigrator) checkVolumeAccessModes(pvc corev1.PersistentVolumeClaim) (PVCError, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we not here return error only?
That is little strange return PVCError and error. is not PVCError an error as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could return (error, error)
however, I would have to deal with the type conversation later on. I rather use the concrete type directly, PVCErrror
, which represents the error from the PVC event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just return error
, and have the cause of the PVCError be the parent error if one exists?
pkg/migrate/migrate.go
Outdated
case <-waitInterval.C: | ||
continue | ||
case <-migrationTimeout.C: | ||
return fmt.Errorf("migration pod %s failed to go into Running phase: timedout", createdPod.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have we a CR that represents the service? if so, could we use use conditional status?
So that, we can properly set the status of this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a CR. This project is both a CLI and library for doing PVC migrations.
What do you mean by
if so, could we use use conditional status?
So that, we can properly set the status of this one?
pkg/migrate/migrate.go
Outdated
@@ -61,6 +85,9 @@ func Cli() { | |||
flag.BoolVar(&options.SetDefaults, "set-defaults", false, "change default storage class from source to dest") | |||
flag.BoolVar(&options.VerboseCopy, "verbose-copy", false, "show output from the rsync command used to copy data between PVCs") | |||
flag.BoolVar(&options.SkipSourceValidation, "skip-source-validation", false, "migrate from PVCs using a particular StorageClass name, even if that StorageClass does not exist") | |||
flag.BoolVar(&options.SkipPVAccessModeValidation, "skip-pv-access-mode-validation", false, "skip the volume access modes validation on the destination storage provider") | |||
flag.IntVar(&options.PvcCopyTimeout, "timeout", 300, "length of time to wait (in seconds) when transferring data from the source to the destination storage volume") | |||
flag.IntVar(&options.PodReadyTimeout, "pod-ready-timeout", 60, "length of time to wait (in seconds) for volume validation pod(s) to go into Ready phase") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: how we have been using this project?
Do we distribute it like a bin and call in the kURL?
Where and how it is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used as a library in kURL's version of pvmigrate
CLI: https://github.com/replicatedhq/kURL/blob/main/kurl_util/cmd/pvmigrate/main.go#L14
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also distributed as a binary publicly, and is installable with krew
go.mod
Outdated
@@ -3,16 +3,20 @@ module github.com/replicatedhq/pvmigrate | |||
go 1.19 | |||
|
|||
require ( | |||
github.com/google/uuid v1.3.0 | |||
github.com/replicatedhq/kurl v0.0.0-20221116172021-8ac37b7b9867 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like it begs to be a circular dep. can we move the method into this project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm specifically importing github.com/replicatedhq/kurl/pkg/k8sutil
. However, nothing in k8sutil
package imports this (migrate
) package, so no import cycle there. I'd rather no duplicate code here if not absolutely required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then lets move the code to pvmigrate. this project should absolutely not depend on kurl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emosbaugh To sever the kURL dependency I can copy those functions into this project or refactor so as to not depend on those functions per your original suggestion. Agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strongly agree
We should NOT be importing kurl here
go.mod
Outdated
k8s.io/api v0.25.4 | ||
k8s.io/apimachinery v0.25.4 | ||
k8s.io/client-go v0.25.4 | ||
k8s.io/kubernetes v1.25.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v1.25.4
pkg/migrate/migrate.go
Outdated
scaleAnnotation = baseAnnotation + "-scale" | ||
kindAnnotation = baseAnnotation + "-kind" | ||
sourceNsAnnotation = baseAnnotation + "-sourcens" | ||
sourcePvcAnnotation = baseAnnotation + "-sourcepvc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for consistency
sourcePvcAnnotation = baseAnnotation + "-sourcepvc" | |
sourcePVCAnnotation = baseAnnotation + "-sourcepvc" |
pkg/migrate/migrate.go
Outdated
VerboseCopy bool | ||
SkipSourceValidation bool | ||
SkipPVAccessModeValidation bool | ||
PvcCopyTimeout int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PvcCopyTimeout int | |
PVCCopyTimeout int |
pkg/migrate/migrate.go
Outdated
} | ||
} | ||
|
||
// start the migration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you see any value in adding a --dry-run flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! Namely for standalone CLI users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's been requested before, actually: #104
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dry run of course could not create pods
pkg/migrate/migrate.go
Outdated
var waitFor []string | ||
propagation := metav1.DeletePropagationForeground | ||
delopts := metav1.DeleteOptions{PropagationPolicy: &propagation} | ||
if err := p.k8scli.CoreV1().PersistentVolumeClaims("default").Delete( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be in the same namespace as the src pvc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Temp pod is created in default
ns. See #139 (comment)
pkg/migrate/migrate.go
Outdated
Containers: []corev1.Container{ | ||
{ | ||
Name: "busybox", | ||
Image: "busybox", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work for airgap customers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! This image would need to be loaded into Docker or containerd unless it's been loaded already by another image which builds atop of busybox
.
Hmm, @ricardomaraschini node disk space validator would also run into the same issue if it was integrated in the kURL installer. The disk space validator relies on the rsync
image to be available. It's probably a good idea for both to use the same container image. I need to think about how to best handle this. I'm currently thinking we should pick an image to use, archive it and then load it during install.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already require an image to be passed via the CLI to do the migration with
Use that image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pvmigrate/pkg/migrate/migrate.go
Line 83 in 075f71c
flag.StringVar(&options.RsyncImage, "rsync-image", "eeacms/rsync:2.3", "the image to use to copy PVCs - must have 'rsync' on the path") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved: We can use the kurl util image and specify it as part of the CLI as show here: https://github.com/replicatedhq/kURL/blob/aea79861716d66787f0b31670f1fc74a7ee16d1f/scripts/common/rook.sh#L153
pkg/migrate/migrate.go
Outdated
propagation := metav1.DeletePropagationForeground | ||
delopts := metav1.DeleteOptions{PropagationPolicy: &propagation} | ||
if pod != nil { | ||
if err := p.k8scli.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, delopts); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err := p.k8scli.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, delopts); err != nil { | |
// Cleanup should use background context so as not to fail if context has already been canceled | |
if err := p.k8scli.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, delopts); err != nil { |
pkg/migrate/migrate.go
Outdated
} | ||
|
||
// getPvcError returns the error event for why a PVC is in Pending status | ||
func (p *PVMigrator) getPvcError(pvc *corev1.PersistentVolumeClaim) (PVCError, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (p *PVMigrator) getPvcError(pvc *corev1.PersistentVolumeClaim) (PVCError, error) { | |
func (p *PVMigrator) getPVCError(pvc *corev1.PersistentVolumeClaim) (PVCError, error) { |
pkg/migrate/migrate.go
Outdated
} | ||
waitFor = append(waitFor, pvc.Name) | ||
|
||
timeout := time.NewTicker(5 * time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeout := time.NewTicker(5 * time.Minute) | |
timeout := time.After(5 * time.Minute) |
pkg/migrate/migrate.go
Outdated
@@ -79,6 +106,37 @@ func Cli() { | |||
|
|||
output := log.New(os.Stdout, "", 0) // this has no time prefix etc | |||
|
|||
// escape hatch - for DEV/TEST ONLY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is for dev/test only, that should be mentioned in the flag description - or the flag should be hidden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the comment. I added this in the case where there is a rare scenario where someone may a justifiable reason to ignore a PVC access volume validation.
pkg/migrate/migrate.go
Outdated
if !options.SkipPVAccessModeValidation { | ||
srcPVs, err := kurlutils.PVSByStorageClass(context.TODO(), clientset, options.SourceSCName) | ||
if err != nil { | ||
fmt.Printf("failed to get volumes using storage class %s: %s", options.SourceSCName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fmt.Printf("failed to get volumes using storage class %s: %s", options.SourceSCName, err) | |
output.Printf("failed to get volumes using storage class %s: %s", options.SourceSCName, err) |
ethan commented the same elsewhere, but let's avoid fmt.printf when we have output
available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree.
pkg/migrate/migrate.go
Outdated
@@ -134,6 +192,82 @@ func Migrate(ctx context.Context, w *log.Logger, clientset k8sclient.Interface, | |||
return nil | |||
} | |||
|
|||
// NewPVMigrator returns a PV migration context | |||
func NewPVMigrator(cfg *rest.Config, log *log.Logger, srcSC, dstSC string, podReadyTimeout int) (*PVMigrator, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this used by?
What is the use case?
Aren't you initializing the struct directly elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this whole function is unused and should not exist
I think there should be a single function to run the whole 'can I migrate from sc A to B without having broken access modes' check, but it should take all the args it needs directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NewPVMigrator()
is used by kURL's pvmigrator
CLI: https://github.com/replicatedhq/kURL/pull/3767/files#diff-fc4b9380946b46363bac9e7ecee925d5142b5fa0f607de99b9f5d3383d952e01R97
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Truthfully what I'm getting from that is 'oh dear lord why did we start reimplementing this logic in kURL, that was the entire reason that the CLI function exists'
I still think the solution is to move the entire 'can I make proper dest PVCs' logic inside of 'Migrate'
pkg/migrate/migrate.go
Outdated
@@ -79,6 +106,37 @@ func Cli() { | |||
|
|||
output := log.New(os.Stdout, "", 0) // this has no time prefix etc | |||
|
|||
// escape hatch - for DEV/TEST ONLY | |||
if !options.SkipPVAccessModeValidation { | |||
srcPVs, err := kurlutils.PVSByStorageClass(context.TODO(), clientset, options.SourceSCName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm relatively certain we have this logic in pvmigrate already - refactoring it into a function would be better than using kurlutils (even besides the 'let's not do circular deps' issue, having the same logic use two different implementations will cause issues)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we do. I can make use of that instead of relying the kurl k8s package.
pkg/migrate/migrate.go
Outdated
PvcCopyTimeout int | ||
PodReadyTimeout int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PvcCopyTimeout int | |
PodReadyTimeout int | |
PvcCopyTimeout time.Duration | |
PodReadyTimeout time.Duration |
Why are we passing around unitless integers when there is literally a type for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought on this was to make easy on clients...just give me a time in seconds (int) and the implementation will convert it to the type(time.Duration
) it needs.
pkg/migrate/migrate.go
Outdated
@@ -103,7 +161,7 @@ func Migrate(ctx context.Context, w *log.Logger, clientset k8sclient.Interface, | |||
return fmt.Errorf("failed to scale down pods: %w", err) | |||
} | |||
|
|||
err = copyAllPVCs(ctx, w, clientset, options.SourceSCName, options.DestSCName, options.RsyncImage, updatedMatchingPVCs, options.VerboseCopy, time.Second) | |||
err = copyAllPVCs(ctx, w, clientset, options.SourceSCName, options.DestSCName, options.RsyncImage, updatedMatchingPVCs, options.VerboseCopy, time.Duration(options.PvcCopyTimeout)*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just.... no
I've requested a change to make the option actually a duration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will accept that change.
pkg/migrate/migrate.go
Outdated
// ValidateVolumeAccessModes checks whether the provided persistent volumes support the access modes | ||
// of the destination storage class. | ||
// returns a map of pvc errors indexed by namespace | ||
func (p *PVMigrator) ValidateVolumeAccessModes(pvs map[string]corev1.PersistentVolume) (map[string]map[string]PVCError, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this function public?
If we're going to keep it public let's move it (and all supporting infa) to a new package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one benefit I see for keeping this public and separating the validation from the migration behaviour is that in the near future we could rollup all the different validation checks into one package or method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but even if we do that we probably don't want to encourage calling one individual validator.
Better to keep this private until we 100% have a need for it externally
pkg/migrate/migrate.go
Outdated
timeoutCtx, cancelCtx := context.WithTimeout(ctx, timeout) | ||
defer cancelCtx() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC this doesn't work great in a for loop.
Also, just more generally, this isn't an OK change IMO - the current logic is 'keep copying as long as it's not stuck', and since some copies may legitimately take hours I would rather not have to wait hours to find out that something was really stuck
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, you're doubling up on timeouts here - you've added timeout logic to copyOnePVC but also to the context it's being passed. Choose one (but preferably choose none in this PR), if we need to do that it needs more discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be fleshed out bit a more. Namely, figure out what constitutes as "stuck" and how to detect that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created internal tracking ticket for handling timeouts in this context.
pkg/migrate/migrate.go
Outdated
@@ -229,7 +376,7 @@ func copyAllPVCs(ctx context.Context, w *log.Logger, clientset k8sclient.Interfa | |||
return nil | |||
} | |||
|
|||
func copyOnePVC(ctx context.Context, w *log.Logger, clientset k8sclient.Interface, ns string, sourcePvcName string, destPvcName string, rsyncImage string, verboseCopy bool, waitTime time.Duration, nodeName string) error { | |||
func copyOnePVC(ctx context.Context, w *log.Logger, clientset k8sclient.Interface, ns string, sourcePvcName string, destPvcName string, rsyncImage string, verboseCopy bool, nodeName string, timeout time.Duration) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please back out all of the changes for this function. They do not appear related to the main thrust of the PR and require more discussion (as they prevent legitimately slow/large transfers from working, or prevent stalled transfers from being detected early, depending on configuration)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll back this out.
pkg/migrate/migrate.go
Outdated
@@ -1146,3 +1296,249 @@ func resetReclaimPolicy(ctx context.Context, w *log.Logger, clientset k8sclient. | |||
|
|||
return nil | |||
} | |||
|
|||
// buildPVConsumerPod creates a pod spec for consuming a pvc | |||
func (p *PVMigrator) buildPVConsumerPod(pvcName string) *corev1.Pod { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we pass in options instead of using an object?
It's not the patten the project has used thus far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laverya you want buildPVConsumerPod()
to accept the name override as an argument instead of pulling it from the receiver?
pkg/migrate/migrate.go
Outdated
if len(podName) > 63 { | ||
podName = podName[0:31] + podName[len(podName)-32:] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I swear this logic is present elsewhere in the code
could you factor them into a common function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup.
pkg/migrate/migrate.go
Outdated
if len(pvcName) > 63 { | ||
pvcName = pvcName[0:31] + pvcName[len(pvcName)-32:] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another instance of this logic to factor out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will address this.
pkg/migrate/migrate.go
Outdated
defer func() { | ||
if err = p.deleteTmpPVC(tmpPVC); err != nil { | ||
p.log.Printf("failed to delete tmp claim: %s", err) | ||
} | ||
}() | ||
defer func() { | ||
if err = p.deletePVConsumerPod(pvConsumerPod); err != nil { | ||
p.log.Printf("failed to delete pv consumer pod %s: %s", pvConsumerPod.Name, err) | ||
} | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO these should be the same defer func, not two
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can consolidate these.
pkg/migrate/migrate.go
Outdated
if !options.SkipPVAccessModeValidation { | ||
srcPVs, err := kurlutils.PVSByStorageClass(context.TODO(), clientset, options.SourceSCName) | ||
if err != nil { | ||
fmt.Printf("failed to get volumes using storage class %s: %s", options.SourceSCName, err) | ||
os.Exit(1) | ||
} | ||
|
||
pvMigrator := PVMigrator{ | ||
ctx: context.TODO(), | ||
log: output, | ||
k8scli: clientset, | ||
srcSc: options.SourceSCName, | ||
dstSc: options.DestSCName, | ||
deletePVTimeout: 5 * time.Minute, | ||
podReadyTimeout: time.Duration(options.PodReadyTimeout) * time.Second, | ||
} | ||
unsupportedPVCs, err := pvMigrator.ValidateVolumeAccessModes(srcPVs) | ||
if err != nil { | ||
fmt.Printf("failed to validate volume access modes for destination storage class %s", options.DestSCName) | ||
os.Exit(1) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would greatly prefer for this logic to be contained within 'Migrate' (after the 'do these storageclasses exist' check, for instance), not independent
If it must be independent, it should be callable as a single function that returns an error and should handle getting the list of PVCs itself.
pkg/migrate/migrate.go
Outdated
|
||
// buildPVC creates a temporary PVC requesting for 1Mi of storage for a provided storage class name. | ||
func (p *PVMigrator) buildTmpPVC(pvc corev1.PersistentVolumeClaim, sc string) *corev1.PersistentVolumeClaim { | ||
pvcName := p.pvcNameOverride |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of having pvcNameOverride, I'd generally recommend using the source PVC's name with a suffix and then trimmed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And name the pod based on the PVC being tested too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes things much clearer to any cluster operator that encounters this in progress
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name override is meant for testing since we need a deterministic name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name should ALWAYS be deterministic, never random
The way to do that is to take something like my-app-pvc-0
and turn it into my-app-pvc-0-pvcmigrate-accessmodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what is done elsewhere in the code (and in fact that's why we need the 'trim from middle' code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will make this change.
pkg/migrate/migrate.go
Outdated
// for testing/mocking | ||
// This fields gets set and updated by the kube api server | ||
Status: corev1.PersistentVolumeClaimStatus{Phase: pvc.Status.Phase}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused - why are you setting status here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're doing this so that you can set the claim status for something else to read in a test function, I'd recommend looking at the backgroundfunc
stuff in some of the other test functions - that way you can wait for 1/10th of a second or whatever and ensure that the logic does not require status to be immediately perfect
cmd/main.go
Outdated
|
||
if len(failures) != 0 { | ||
preflight.PrintValidationFailures(output.Writer(), failures) | ||
os.Exit(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any possibility for the purposes of kurl we run validate and then migrate separately? then we dont have to have this alternate exit code. if we cant, can we try to map this exit code to standard linux exit codes? also should we use a constant here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention was to only run validate()
as part of the openebs migration prompt here and abort the install if the validate failed. Also, AFAIK there are no standard/conventional exit codes. We could model this after a linux command or come up with our own scheme. I lean towards the latter.
cmd/main.go
Outdated
err = migrate.Migrate(context.TODO(), output, clientset, options) | ||
if err != nil { | ||
output.Printf("migration failed: %s", err) | ||
os.Exit(3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be 1
cmd/main.go
Outdated
flag.BoolVar(&options.SetDefaults, "set-defaults", false, "change default storage class from source to dest") | ||
flag.BoolVar(&options.VerboseCopy, "verbose-copy", false, "show output from the rsync command used to copy data between PVCs") | ||
flag.BoolVar(&options.SkipSourceValidation, "skip-source-validation", false, "migrate from PVCs using a particular StorageClass name, even if that StorageClass does not exist") | ||
flag.IntVar(&options.PvcCopyTimeout, "pvc-copy-timeout", 300, "length of time to wait (in seconds) when transferring data from the source to the destination storage volume") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flag.IntVar(&options.PvcCopyTimeout, "pvc-copy-timeout", 300, "length of time to wait (in seconds) when transferring data from the source to the destination storage volume") | |
flag.IntVar(&options.PVCCopyTimeout, "pvc-copy-timeout", 300, "length of time to wait (in seconds) when transferring data from the source to the destination storage volume") |
pkg/k8sutil/storage.go
Outdated
|
||
allpvs, err := cli.CoreV1().PersistentVolumes().List(ctx, metav1.ListOptions{}) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to get persistent volumes: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, fmt.Errorf("failed to get persistent volumes: %w", err) | |
return nil, fmt.Errorf("failed to list persistent volumes: %w", err) |
pkg/preflight/validate.go
Outdated
// validate access modes for all PVCs using the d source storage class | ||
pvcs, err := pvcsForStorageClass(ctx, w, clientset, options.SourceSCName, options.Namespace) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to get PVCs for storage %s: %s", options.SourceSCName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, fmt.Errorf("failed to get PVCs for storage %s: %s", options.SourceSCName, err) | |
return nil, fmt.Errorf("failed to get PVCs for storage %s: %w", options.SourceSCName, err) |
pkg/preflight/validate.go
Outdated
} | ||
pvcAccesModeFailures, err := validateVolumeAccessModes(ctx, w, clientset, options.DestSCName, time.Duration(options.PodReadyTimeout), pvcs) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed validate PVC access modes: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, fmt.Errorf("failed validate PVC access modes: %s", err) | |
return nil, fmt.Errorf("failed validate PVC access modes: %w", err) |
pkg/preflight/validate.go
Outdated
|
||
pvcEvents, err := client.CoreV1().Events(pvc.Namespace).Search(scheme.Scheme, pvc) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to list events for PVC %s", pvc.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, fmt.Errorf("failed to list events for PVC %s", pvc.Name) | |
return nil, fmt.Errorf("failed to list events for PVC %s: %w", pvc.Name, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. I've addressed this.
pkg/preflight/validate.go
Outdated
return srcPVCs, nil | ||
} | ||
|
||
func newK8sName(prefix, original string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to k8sutil package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. thanks.
Modified the output of the preflight validation to be generic. When there is a validation failure when migrating from Rook to OpenEBS 👇
|
7ae9c3d
to
7c8d9e5
Compare
} | ||
|
||
if len(failures) != 0 { | ||
preflight.PrintValidationFailures(os.Stdout, failures) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be stderr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to separate the output of the validation failures to go to stdout
and program errors to stderr
.
cmd/main.go
Outdated
// Cli uses CLI options to run Migrate | ||
var options migrate.Options | ||
var skipPreflightValidation bool | ||
var preflightValidationOnly bool | ||
flag.StringVar(&options.SourceSCName, "source-sc", "", "storage provider name to migrate from") | ||
flag.StringVar(&options.DestSCName, "dest-sc", "", "storage provider name to migrate to") | ||
flag.StringVar(&options.RsyncImage, "rsync-image", "eeacms/rsync:2.3", "the image to use to copy PVCs - must have 'rsync' on the path") | ||
flag.StringVar(&options.Namespace, "namespace", "", "only migrate PVCs within this namespace") | ||
flag.BoolVar(&options.SetDefaults, "set-defaults", false, "change default storage class from source to dest") | ||
flag.BoolVar(&options.VerboseCopy, "verbose-copy", false, "show output from the rsync command used to copy data between PVCs") | ||
flag.BoolVar(&options.SkipSourceValidation, "skip-source-validation", false, "migrate from PVCs using a particular StorageClass name, even if that StorageClass does not exist") | ||
flag.IntVar(&options.PodReadyTimeout, "pod-ready-timeout", 60, "length of time to wait (in seconds) for volume validation pod(s) to go into Ready phase") | ||
flag.BoolVar(&skipPreflightValidation, "skip-preflight-validation", false, "skip the volume access modes validation on the destination storage provider") | ||
flag.BoolVar(&preflightValidationOnly, "preflight-validation-only", false, "skip the migration and run preflight validation only") | ||
|
||
flag.Parse() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have we removed the CLI function?
I suppose that kurl isn't using it anymore, so it might not be useful...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored this to avoid import cycles from pvmigrate
package. Moreover, handling CLI flags in the pvmigrate
package is not the right place to do this.
// Validate runs preflight check on storage volumes returning a list of failures | ||
func Validate(ctx context.Context, w *log.Logger, clientset k8sclient.Interface, options migrate.Options) ([]ValidationFailure, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should validateStorageClasses
- which validates that storageclasses are present in the cluster - be part of Validate instead of Migrate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validation of storage classes will happen in both the validation and migration paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This the output when the validation can't find the storage class(es):
$ ./kurl/bin/pvmigrate --source-sc "non-default" --dest-sc "longhorn" --rsync-image "replicated/ku
rl-util:alpha" --skip-free-space-check --preflight-validation-only
The following resources cannot be migrated:
NAMESPACE RESOURCE MESSAGE
--------- -------- -------
sc/non-default Resource not found
sc/longhorn Resource not found
pkg/migrate/migrate.go
Outdated
fmt.Printf("%s\n", err.Error()) | ||
os.Exit(1) | ||
} | ||
PodReadyTimeout int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be a time.Duration, not an int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
// NewPrefixedName returns a name prefixed by prefix and with length that is no longer than 63 | ||
// chars | ||
func NewPrefixedName(prefix, original string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind using this in Migrate? (specifically in newPvcName
)
(This might require changing this to be NewSuffixedName
for consistency)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will take this suggestion as a future TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, but there's one other thing I'd like to see -
PLEASE update the readme with the additional CLI flags and process steps.
pkg/preflight/validate.go
Outdated
@@ -98,6 +109,34 @@ func validateVolumeAccessModes(ctx context.Context, l *log.Logger, client k8scli | |||
return volAccessModeFailures, nil | |||
} | |||
|
|||
// validateStorageClasses returns any failures encountered when discovering the source and destination | |||
// storage classes | |||
func validateStorageClasses(ctx context.Context, l *log.Logger, clientset k8sclient.Interface, sourceSCName, destSCName string) ([]ValidationFailure, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you copy the logic from the migrate version more precisely? This drops support for the 'skip-source-validation' flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if _, err := client.CoreV1().PersistentVolumes().Get( | ||
ctx, pv.Name, metav1.GetOptions{}, | ||
); err != nil && !k8serrors.IsNotFound(err) { | ||
l.Printf("failed to get pv for temp pvc %s: %s", pvc, err) | ||
} else if err != nil && k8serrors.IsNotFound(err) { | ||
break | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you return nil here?
if _, err := client.CoreV1().PersistentVolumes().Get( | |
ctx, pv.Name, metav1.GetOptions{}, | |
); err != nil && !k8serrors.IsNotFound(err) { | |
l.Printf("failed to get pv for temp pvc %s: %s", pvc, err) | |
} else if err != nil && k8serrors.IsNotFound(err) { | |
break | |
} | |
if _, err := client.CoreV1().PersistentVolumes().Get( | |
ctx, pv.Name, metav1.GetOptions{}, | |
); err != nil { | |
if k8serrors.IsNotFound(err) { | |
return nil | |
} | |
l.Printf("failed to get pv for temp pvc %s: %s", pvc, err) | |
} |
case <-interval.C: | ||
continue | ||
case <-timeout: | ||
return fmt.Errorf("failed to delete pvs: timeout") | ||
} | ||
} | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remove this return statement, Go requires a return
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this felt like unreachable code in my head based on the other change requested in my review. but ill take your word for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as i dont feel that strongly about my suggestion ill approve
Currently, when migrating data, we do not consider whether the source PV's access modes are supported by the destination's volume provider. Not having this check makes the migrator prone to errors during the migration which is not the optimal experience. E.g. if migration from a volume that supports access modes RWX(Minio) to a volume which only supports RWO (local volume). For cases such as this, we want to warn the users about such errors prior to the migration so they can fix them ahead of time and have a smoother migration experience.
Solution
pvmigrate
client and library will support running volume access modes validation checks on the destination PV. The checks will verify if the access modes on any of the source PVCs are supported by the destination volume provider. This is accomplished by creating temporary headless pods which create and consume the destination storage class volume with the access modes of the source PVCs. Any failures will be shown to the user via the terminal.How it Works
At a high level, the validation does the following:
Ref: Internal ticket tracker [sc-61144]