-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use VMCache instead of VMICache to judge if the NAD is in use #131
base: master
Are you sure you want to change the base?
Conversation
@mingshuoqiu can you also add steps in Testplan for verifying deletion of vlan config as well with stopped vm. |
bec79e5
to
1477278
Compare
for _, vm := range vms { | ||
if vm.Namespace != nad.Namespace { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this check needed ?
if err := v.checkVmi(vc, nodes); err != nil { | ||
return fmt.Errorf(deleteErr, vc.Name, err) | ||
} | ||
|
||
nads, err := v.nadCache.List(util.HarvesterSystemNamespaceName, labels.Set(map[string]string{ | ||
utils.KeyClusterNetworkLabel: vc.Spec.ClusterNetwork, | ||
}).AsSelector()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The namespace must be "" or metav1.NamespaceAll as we need to fetch all nads from all namespaces.
HarvesterSystemNamespaceName will fetch nads only from harvester-system namespace
Thanks @mingshuoqiu for taking care of the comments. |
pkg/utils/vmi.go
Outdated
|
||
for _, vm := range vmsTmp { | ||
if vm.Namespace != nad.Namespace { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this existing check and if its needed, we can still attach a nad from any namespace to a vm in any namespace.This operation is supported currently.And with this check we might miss vms that have attached a nad from another namespace and the nad could be deleted while the vm is still present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a general question regarding the fix: Do we still need the VmiGetter? Under what circumstances do we rely on VMIs to check whether an action is allowed or not? Maybe the logic in the WhoUseNad
for VMI and VM is redundant, and we can completely replace VMI with VM. Please advise. Thank you.
Need to re-implement the |
I tried to replace all VMI handler to VM. But I will get the log from the webhook pod. How should I add the resource to the access list? CRD? @starbops
|
@mingshuoqiu Also please add a step to testcase for NAD update scenario and it should not be allowed on non-running VMs as well.Thanks |
@mingshuoqiu What's the current status & plan of this PR? Will it be merged on v1.5.0? thanks. The #149 is better to on top of this PR, thus it can have an unified check on NAD, VM, VMI, storagenetwork. |
@mingshuoqiu Need to add vm to the clusterrole |
I'm verifying with the updated rbac. Will update when verification done. |
1477278
to
1f70a1d
Compare
pkg/webhook/nad/validator.go
Outdated
@@ -126,18 +126,18 @@ func (v *Validator) checkRoute(config string) error { | |||
} | |||
|
|||
func (v *Validator) checkVmi(nad *cniv1.NetworkAttachmentDefinition) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix, will break the update nad
feature, as checkVmi
is also used in func (v *Validator) Update(_ *admission.Request, oldObj, newObj runtime.Object)
.
When user changes nad, all this nad referred VMIs need to be stopped. If it is storagenetwork nad, user must clear storagenetwork/set-to-another storagenetwork nad first.
When user deletes nad, all this nad referred VMs need to be changed to remove the network referrences at least, or delete the related VMs. If it is storagenetwork nad, user must clear storagenetwork/set-to-another storagenetwork nad first.
And, checkVmi
is still needed, as: some VMs may be changed but their VMIs are not rebooted (this is allowed from Harvester UI), you can't skip the check of VMI.
An additional new function checkVM
is needed for nad deletion webhook.
Other changes need to be reviewed similarly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@w13915984028 For the NAD updates, idea is to prevent users from making any such updates which is already being referenced by VMs.
So I guess the error message should be changed accordingly.
As you said deleting/readding the VMs on already planned infrastructure is not a good option.But the idea is to stop users from updating the NAD on VMs which is already using it.
But IMO, we should allow such updates on NADs which is not referenced by any of the VMs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rrajendran17 Change the Vlan,HairpinMode,PromiscMode and some others
of the NAD params is reasonable, stop & start VM is okay; it is not a good idea to force user to delete the VM/VM's related interface.
type NetConf struct {
cniv1.NetConf
BrName string `json:"bridge"`
IsGW bool `json:"isGateway"`
IsDefaultGW bool `json:"isDefaultGateway"`
ForceAddress bool `json:"forceAddress"`
IPMasq bool `json:"ipMasq"`
MTU int `json:"mtu"`
HairpinMode bool `json:"hairpinMode"`
PromiscMode bool `json:"promiscMode"`
Vlan int `json:"vlan"`
}
change BrName
(clusternetwork
) is rare or should be denied.
BTW: In this commit 9f06c54#diff-900397cb9002ed992e8565ae266dc4ec231a187b1d3cf016f5fa0cd6ebf82d04, I add two additional checks on nad webhook:
In this commit 7dfe2d6#diff-900397cb9002ed992e8565ae266dc4ec231a187b1d3cf016f5fa0cd6ebf82d04
|
More about the scenarios: In current stage, the most import param of NAD is VID. (1) Currently, NAD is allowed to be changed, as long as all attached VMIs are stopped. If we force the VM should also remove the referrences to NAD before NAD change, it is a break change. The broken scenarios are like: (a) User migrates a group of VMs from VID 100 to VID 200 due to network re-plan; or change the L3 IP subnets for this VID. (b) User creates a dedicated NAD for network testing, and creates a couple of VMs attaching to this NAD. User plans to open a new VID 300, stop this group of VMs, change NAD, start the VMs, then test the new VID. User can use this groups of VMs to test any existing VIDs at any time. (c) ... (2) Change the This is a bug on Harvester now, it can be done if (1) is met, but, all the related VMs are not reflected. Or I missed something, Harvester has code to care it. Please remind me if you know. Luckily, we did not receive related bug reports now, if it is an real bug. We need to: (i) Enhance (1) via this PR #131, add additional check on VM when NAD is deleted. Be very careful, do NOT import break change, VM is the core feature of Harvester. (ii) Enhance (2) via issue harvester/harvester#7701 and related PR, deny the change of
@rrajendran17 There are race cases, create VM & change NAD happen in parallel, it may still cause some issues. When a NAD is attached to wrong clusternetwork (very rarely), delete & recreate is not so unaccetable. We may assume this is a corner case, and add this limitation to the document. If some user really needs this, we can try to support it later. |
FYI: When deleting a NAD, alone with vmi, vm is also checked. This is done via 97eadd5 to cooprate with other functions. The test code is also added. harvester/charts#339 is also filed to update rbac. |
The VMI status will be none if the VM is turned off from the GUI. The VMI information can't not be relied to make sure if particular NAD can be deleted or not. Use VMCache instead to prevent from deleting the NAD if the VM is attached. Signed-off-by: Chris Chiu <[email protected]>
The VMI status will be none if the VM is turned off from the GUI. The VMI information can't not be relied to make sure if particular NAD is attched to any VM or not. Use VMCache instead to prevent from deleting the NAD if the VM is turned off.
Problem:
Fail to delete an Off VM if its network was deleted. The VM network can be deleted if the attached VM is off. It will be a problem if user try to turn the VM back to on.
Solution:
Do not allow VM network removing if the VM still exists, even it is OFF.
Related Issue:
harvester/harvester#6961
Test plan: