Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency in Deployments and Statefulsets "wait" sequence between Spray v3 and v4 + bugs #60

Open
pamiel opened this issue Jul 16, 2020 · 0 comments
Assignees
Labels
v4 Issues only affecting v4

Comments

@pamiel
Copy link
Contributor

pamiel commented Jul 16, 2020

Spray v3 is checking the completion of the upgrade process differently when "waiting" a Deployment compared to a Statefulset... while Spray v4 has a fully different algorithm for that... that is the same for both Deployments and Statefulsets! Who is right? who is wrong?

The current issue is an analysis of what are the various available counters for Deployments as well as for StatefulSets, in order to find the right way to detect the end of the upgrade process.
It consider the RollingUpdate update strategy only. For StatefulSets, when deployed with a strategy OnDelete, no "wait" shall be done (refer to issue #58).
Tests have been performed on Kubernetes 1.14... without knowing whether the results are changing on more recent versions, sorry for that...

Spray v3

For deployments

Here is a sequence of counters updates during a rolling update of a Deployment in the following situation:

  • Number of replicas is unchanged at 2
  • All Pods are restarted due to a change in their annotations
.spec.replicas: 2 .status.replicas:2 .status.availableReplicas:2 .status.readyReplicas:2 .status.updatedReplicas:2
.spec.replicas: 2 .status.replicas:3 .status.availableReplicas:2 .status.readyReplicas:2 .status.updatedReplicas:1 .status.unavailableReplicas:1
.spec.replicas: 2 .status.replicas:3 .status.availableReplicas:2 .status.readyReplicas:2 .status.updatedReplicas:2 .status.unavailableReplicas:1
.spec.replicas: 2 .status.replicas:2 .status.availableReplicas:2 .status.readyReplicas:2 .status.updatedReplicas:2

Here is another sequence of counters updates during a rolling update of a Deployment in the following situation:

  • Number of replicas is changed from 2 to 3
  • Nothing changed in Pods definition => existing Pods are not restarted
.spec.replicas: 2 .status.replicas:2 .status.availableReplicas:2 .status.readyReplicas:2 .status.updatedReplicas:2
.spec.replicas: 3 .status.replicas:3 .status.availableReplicas:2 .status.readyReplicas:2 .status.updatedReplicas:3 .status.unavailableReplicas:1
.spec.replicas: 3 .status.replicas:3 .status.availableReplicas:3 .status.readyReplicas:3 .status.updatedReplicas:3

Here is a final sequence of counters updates during a rolling update of a Deployment in the following situation:

  • Number of replicas is changed from 3 to 2
  • Change in the annotations => all existing Pods are also restarted
.spec.replicas: 3 .status.replicas:3 .status.availableReplicas:3 .status.readyReplicas:3 updatedReplicas:3
.spec.replicas: 2 .status.replicas:3 .status.availableReplicas:2 .status.readyReplicas:2 updatedReplicas:1 unavailableReplicas:1
.spec.replicas: 2 .status.replicas:3 .status.availableReplicas:2 .status.readyReplicas:2 updatedReplicas:2 unavailableReplicas:1
.spec.replicas: 2 .status.replicas:2 .status.availableReplicas:2 .status.readyReplicas:2 updatedReplicas:2

The algorithm currently implemented in Spray v3 to check the end of the "wait" sequence for Deployments is:

  • If .spec.replicas != .status.readyReplicas then continue to wait
  • else
    • if (.spec.replicas == .status.updatedReplicas) and (.spec.replicas == .status.replicas) then STOP waiting
    • else continue to wait

It uses the .spec.replicas, .status.replicas, .status.updatedReplicas and .status.readyReplicas counters but not the .status.availableReplicas and .status.unavailableReplicas counters.

This algorithm looks to work fine, but maybe an easier algorithm might be to just check the .status.unavailableReplicas counter and end waiting when it is no longer present...

For StatfulSets

Here is a sequence of counters updates during a rolling update of a StatefulSet in the following situation:

  • Number of replicas is unchanged at 2
  • All Pods are restarted due to a change in their annotations
.spec.replicas: 2 .status.replicas:2 .status.readyReplicas:2 .status.currentReplicas:2 .status.updatedReplicas:2
.spec.replicas: 2 .status.replicas:2 .status.readyReplicas:2 .status.currentReplicas:1                          
.spec.replicas: 2 .status.replicas:2 .status.readyReplicas:1 .status.currentReplicas:1                          
.spec.replicas: 2 .status.replicas:2 .status.readyReplicas:1 .status.currentReplicas:1 .status.updatedReplicas:1
.spec.replicas: 2 .status.replicas:2 .status.readyReplicas:2                           .status.updatedReplicas:1
.spec.replicas: 2 .status.replicas:2 .status.readyReplicas:1                           .status.updatedReplicas:1
.spec.replicas: 2 .status.replicas:2 .status.readyReplicas:1                           .status.updatedReplicas:2
.spec.replicas: 2 .status.replicas:2 .status.readyReplicas:2 .status.currentReplicas:2 .status.updatedReplicas:2

Here is another sequence of counters updates during a rolling update of a StatefulSet in the following situation:

  • Number of replicas is changed from 2 to 3
  • Nothing changed in Pods definition => existing Pods are not restarted
.spec.replicas: 2 .status.replicas:2 .status.readyReplicas:2 .status.currentReplicas:2 .status.updatedReplicas:2
.spec.replicas: 3 .status.replicas:3 .status.readyReplicas:2 .status.currentReplicas:3 .status.updatedReplicas:3
.spec.replicas: 3 .status.replicas:3 .status.readyReplicas:3 .status.currentReplicas:3 .status.updatedReplicas:3

Here is a final sequence of counters updates during a rolling update of a StatefulSet in the following situation:

  • Number of replicas is changed from 3 to 2
  • Change in the annotations => all existing Pods are also restarted
.spec.replicas: 3 .status.replicas:3 .status.readyReplicas:3 .status.currentReplicas:3 .status.updatedReplicas:3
.spec.replicas: 2 .status.replicas:3 .status.readyReplicas:3 .status.currentReplicas:2 
.spec.replicas: 2 .status.replicas:3 .status.readyReplicas:2 .status.currentReplicas:2 
.spec.replicas: 2 .status.replicas:2 .status.readyReplicas:2 .status.currentReplicas:1 
.spec.replicas: 2 .status.replicas:2 .status.readyReplicas:1 .status.currentReplicas:1 
.spec.replicas: 2 .status.replicas:2 .status.readyReplicas:1 .status.currentReplicas:1 .status.updatedReplicas:1
.spec.replicas: 2 .status.replicas:2 .status.readyReplicas:2                           .status.updatedReplicas:1
.spec.replicas: 2 .status.replicas:2 .status.readyReplicas:1                           .status.updatedReplicas:1
.spec.replicas: 2 .status.replicas:2 .status.readyReplicas:1                           .status.updatedReplicas:2
.spec.replicas: 2 .status.replicas:2 .status.readyReplicas:2 .status.currentReplicas:2 .status.updatedReplicas:2

The algorithm currently implemented in Spray v3 to check the end of the "wait" sequence for StatefulSets is:

  • If .spec.replicas != .status.readyReplicas then continue to wait
  • else
    • if .spec.replicas == .status.currentReplicas then STOP waiting
    • else continue to wait

It uses the .spec.replicas, .status.currentReplicas and .status.readyReplicas counters but not the .status.replicas and .status.updatedReplicas counters.

Remark: this algorithm does not manage correctly the 3rd sequence mentioned above: the waiting period ends immediately because there is no check between .spec.replicas and .status.replicas (as it is done for Deployments). A fix for that looks to be, for the second if statement:

  • if (.spec.replicas == .status.currentReplicas) and (.spec.replicas == .status.replicas) then STOP waiting

Can we homogenize the 2 algorithms?

It is not possible that the second if statement of the algorithm be exactly the same for both Deployments and StatefulSets because:

  • Deployments do not have a .status.currentReplicas => cannot use it
  • StatefulSets have the .status.updatedReplicas, but its value becomes equal to .spec.replicas before all steps are completed (there is always a last line where the .status.currentReplicas is set to the right value)

Spray v4

Sequences are supposed to be the same, as they depend only on Kubernetes, and not Spray itself. Note that I was unfortunately not able to test Spray v4 to confirm this.

In any case, the algorithms to check the end of the "wait" sequence are different from Spray v3: both Deployments and StatefulSets have the same algorithm:

  • if .status.readyReplicas is defined (and not equal to 0 ?)
    • if .status.readyReplicas < .spec.replicas then continue to wait
    • else STOP waiting
  • else continue to wait

(not sure about the analysis of the go-template => maybe to be confirmed...)

Issue?

Following my analysis, this algorithm unfortunately does NOT work for the sequences 1 and 3, both for Deployments and StatefulSet. This would need to be verified in practice...
If so, they would have to be updated accordingly. Following the same algorithms implemented in Spray v3 ?

@cvila84 cvila84 added the v4 Issues only affecting v4 label Jul 17, 2020
@cvila84 cvila84 self-assigned this Jul 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v4 Issues only affecting v4
Projects
None yet
Development

No branches or pull requests

2 participants