-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle bridge vlan deletion during NAD update #155
base: master
Are you sure you want to change the base?
Conversation
This fix uncovers another issue, Example scenario mgmt interface is part of custom vlan (for example 2011, mgmt-br and mgmt-bo will be part of 2011). Need to find a solution to prevent above scenario. |
@rrajendran17 What do you think that we deny the switching of clusternetwork ? The NAD's clusternetwork is converted to VM's affinity; even when VMs are off, change the NAD's clusternetwork is hard to backfill to VM's affinity. At the moment, this looks to be an implicit rule.
|
@w13915984028 I think it is ok for the users to change the cluster network on a NAD before the VM creation. |
currently we do not act on mgmt cluster network during NAD updates, so this PR changes are good to go. I am working on harvester/harvester#7650, which will also consider mgmt cluster network for NAD updates.I will handle the above issue mentioned with that PR |
I added some enhancments on PR #149. In this commit 9f06c54#diff-900397cb9002ed992e8565ae266dc4ec231a187b1d3cf016f5fa0cd6ebf82d04, I add two additional checks on nad webhook:
In this commit 7dfe2d6#diff-900397cb9002ed992e8565ae266dc4ec231a187b1d3cf016f5fa0cd6ebf82d04
For On the other hand, for change like nad's referred And, Harvester does not have a |
True, but the use case I am thinking more from a scenario where a NAD is created and not attached to any VMs, and user by mistake attached to a cluster network and wanted to update it to an another cluster network.I think this should at least work as user does not have to delete and recreate the NAD for any update. I think it will not be right to deny it all the time. |
I agree. Users should be able to change NAD's associated ClusterNetwork only when no VMs are attached to it. We should not block such an action entirely for no reason. |
@@ -119,7 +119,7 @@ func (h Handler) existDuplicateNad(vlanIDStr, cn string) (bool, error) { | |||
return false, err | |||
} | |||
|
|||
return len(nads) > 1, nil | |||
return len(nads) > 0, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we need to change this. Maybe I lost the context. Can you add some comment to it? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check should be even if atleast 1 NAD exists in the same cluster network and vlan id which is going to be deleted, then we should skip deleting the local area network as we update the NAD with new cluster network(ensureLabels) even before this check.
Example scenario:
1.NAD1 - vlan 2013 is part of cluster-1 and NAD2 - vlan 2013 is part of cluster-2
2.Update NAD2 to cluster-1
3.No changes to NAD1, but NAD2 will be part of cluster-1 now and bridge vlans also changed accordingly.
4.Now again update NAD2 to cluster-2
NAD2 will be part of cluster-2 again, but 2013 part of cluster-1 should not be removed from bridge vlan as NAD1 is still referencing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please let me know if I am missing any other scenarios.Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The back & forth moving a/a group of NAD(s) from/to other clusternetworks frequently, may cause the agent in abnormal status.
e.g., One NAD is changed from cn1, to cn2, to cn3, to cn4 continuously, the client onChange may not get all the change in sequences, it may get cn1->cn3->cn4, or cn1->cn2->cn4 .., but I did not test to get the full log yet.
I add this example as a scenario for harvester/harvester#7701
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @w13915984028 , I hope this sequence is possible with vlan id and network type changes as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some questions, thanks.
@@ -119,7 +119,7 @@ func (h Handler) existDuplicateNad(vlanIDStr, cn string) (bool, error) { | |||
return false, err | |||
} | |||
|
|||
return len(nads) > 1, nil | |||
return len(nads) > 0, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the consideration to change from 1 to 0, to confirm the duplicatedNad
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, as mentioned in this previous response
#155 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@starbops @w13915984028 I have removed this changes as existDuplicateNad() is called from removeLocalArea which is called during both OnChange() and OnRemove() of nad.
case of cluster network update:
OnChange() needs to handle existDuplicateNad() checking for len(nads) > 0 - check if atleast 1 nad exists with same vlan-id and cluster network name to avoid deletion
OnRemove() needs to handle existDuplicateNad() checking for len(nads) > 1 - avoid deletion until other references of the same vlan-id/cluster network present.
If we are going to avoid cluster network updates from webhook as Jian is taking care in another PR, then the change I introduced in existDuplicateNad() is not required.
@@ -119,7 +119,7 @@ func (h Handler) existDuplicateNad(vlanIDStr, cn string) (bool, error) { | |||
return false, err | |||
} | |||
|
|||
return len(nads) > 1, nil | |||
return len(nads) > 0, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The back & forth moving a/a group of NAD(s) from/to other clusternetworks frequently, may cause the agent in abnormal status.
e.g., One NAD is changed from cn1, to cn2, to cn3, to cn4 continuously, the client onChange may not get all the change in sequences, it may get cn1->cn3->cn4, or cn1->cn2->cn4 .., but I did not test to get the full log yet.
I add this example as a scenario for harvester/harvester#7701
This changes takes are of deleting the vlan from the corresponding bridge/cluster-network when Network type and vlan-id is changed on the NAD. |
Problem:
When updating the VM VLAN Network for fields like cluster network or vlan, the corresponding vlan is added to the new cluster network, but not removed from old cluster network.
a.Network controller skips deletion of vlan from bridge (removeLocalArea) when LastKeyNetworkType is empty
b.bug in checking the length of existing nad count in nad cache
Solution:
LastKeyNetworkType could be empty but there could be a cluster network/vlan id update.
Related Issue:
harvester/harvester#7676
Test plan:
case 1:
a.create cluster1 ->vlanconfig1 on node1
b.create cluster2 -> vlanconfig2 on node2
c.create NAD1 with vlan 2013 in cluster1
d.create NAD2 with vlan 2013 in cluster2
Node 1: output of "bridge vlan show"
should show 2013 vlan part of cluster-1
Node 2: output of "bridge vlan show"
should show 2013 vlan part of cluster-2
e.Update the NAD2 from cluster2 to cluster1
Expected Behavior:
Node 1: output of "bridge vlan show"
should show 2013 vlan part of cluster-1
Node 2: output of "bridge vlan show"
should not show 2013 vlan part of cluster-2
as vlan 2013 is not part of cluster-2 anymore.
case 2:
a.Update NAD2 to cluster 2 again
Node 1: output of "bridge vlan show"
should show 2013 vlan part of cluster-1
Node 2: output of "bridge vlan show"
should show 2013 vlan part of cluster-2
case 3:
a.Update the vlan id of NAD1 to 2012
Node 1: output of "bridge vlan show"
should show 2012 vlan part of cluster-1
should not show 2013 vlan part of cluster-1
Node 2: output of "bridge vlan show"
should show 2013 vlan part of cluster-2
case 4:
a.Update the vlan id of NAD2 to 2012
Node 1: output of "bridge vlan show"
should show 2012 vlan part of cluster-1
should not show 2013 vlan part of cluster-1
Node 2: output of "bridge vlan show"
should show 2012 vlan part of cluster-2
should not show 2013 vlan part of cluster-2
case 5:
a.Update the vlan id of NAD1 back to 2013
b.Update the vlan id of NAD2 back to 2013
Node 1: output of "bridge vlan show"
should show 2013 vlan part of cluster-1
Node 2: output of "bridge vlan show"
should show 2013 vlan part of cluster-2
The following PR should take care of avoiding NAD update/deletion on VMs (not in running state) using those NADs.
#131