Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Required Pod anti affinity if Active MDS is not more than 1 #3035

Merged

Conversation

malayparida2000
Copy link
Contributor

When Active MDS is 1, there would be 2 mds pods & we want them to be scheduled on different nodes always. So we use the required pod anti-affinity. If Active MDS are more than 1 then in a 3 node cluster scheduling issues would be there, so we switch to using preferred anti-affinity.

When Active MDS is 1, there would be 2 mds pods & we want them to be
scheduled on different nodes always. So we use the required pod
anti-affinity. If Active MDS are more than 1 then in a 3 node cluster
scheduling issues would be there, so we switch to using preferred
anti-affinity.

Signed-off-by: Malay Kumar Parida <[email protected]>
@malayparida2000
Copy link
Contributor Author

Testing

When activeMetadataServers count is not more than 1

~ $ oc get cephfilesystem ocs-storagecluster-cephfilesystem -o yaml | yq '.spec.metadataServer.placement'                                                                                          
nodeAffinity:
 requiredDuringSchedulingIgnoredDuringExecution:
   nodeSelectorTerms:
     - matchExpressions:
         - key: cluster.ocs.openshift.io/openshift-storage
           operator: Exists
podAntiAffinity:
 requiredDuringSchedulingIgnoredDuringExecution:
   - labelSelector:
       matchExpressions:
         - key: rook_file_system
           operator: In
           values:
             - ocs-storagecluster-cephfilesystem
     topologyKey: topology.kubernetes.io/zone
tolerations:
 - effect: NoSchedule
   key: node.ocs.openshift.io/storage
   operator: Equal
   value: "true"

~ $ oc get pods -o wide | grep mds
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-6b65d596jrxz2   2/2     Running     0          3s    10.129.2.30   ip-10-0-67-201.ec2.internal   <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-7cdf54cd787dx   2/2     Running     0          19m   10.131.0.31   ip-10-0-1-83.ec2.internal     <none>           <none>

When activeMetadataServers count is more than 1

~ $ oc get cephfilesystem ocs-storagecluster-cephfilesystem -o yaml | yq '.spec.metadataServer.placement'                                                                                          
nodeAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
    nodeSelectorTerms:
      - matchExpressions:
          - key: cluster.ocs.openshift.io/openshift-storage
            operator: Exists
podAntiAffinity:
  preferredDuringSchedulingIgnoredDuringExecution:
    - podAffinityTerm:
        labelSelector:
          matchExpressions:
            - key: rook_file_system
              operator: In
              values:
                - ocs-storagecluster-cephfilesystem
        topologyKey: topology.kubernetes.io/zone
      weight: 100
tolerations:
  - effect: NoSchedule
    key: node.ocs.openshift.io/storage
    operator: Equal

 ~ $ oc get pods -o wide | grep mds
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-6b65d596jrxz2   2/2     Running     0          40s   10.129.2.30   ip-10-0-67-201.ec2.internal   <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5c7b9b46nxtcn   2/2     Running     0          25s   10.131.0.35   ip-10-0-1-83.ec2.internal     <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-c-669997d4k4jgr   2/2     Running     0          18s   10.128.2.55   ip-10-0-56-236.ec2.internal   <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-d-849db8c4zncpd   2/2     Running     0          17s   10.131.0.36   ip-10-0-1-83.ec2.internal     <none>           <none>

@malayparida2000
Copy link
Contributor Author

/cc @parth-gr

Copy link
Contributor

openshift-ci bot commented Feb 12, 2025

@malayparida2000: GitHub didn't allow me to request PR reviews from the following users: parth-gr.

Note that only red-hat-storage members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @parth-gr

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Member

@parth-gr parth-gr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

openshift-ci bot commented Feb 12, 2025

@parth-gr: changing LGTM is restricted to collaborators

In response to this:

lgtm

cc @agarwal-mudit @iamniting @nb-ohad

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Member

@parth-gr parth-gr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

},
// if active MDS number is more than 1 then Preferred and if it is 1 then Required pod anti-affinity is set
mdsWeightedPodAffinity := defaults.GetMdsWeightedPodAffinityTerm(100, generateNameForCephFilesystem(sc))
if sc.Spec.ManagedResources.CephFilesystems.ActiveMetadataServers > 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malayparida2000 can we also list the total nodes and see if the storage nodes are more than a number of pods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parth-gr I didn't do that as this fix needs to be backported till 4.15 & I don't want to add client list operations in backports. We can implement the get node & decision based on that may be in 4.19.

Copy link
Member

@parth-gr parth-gr Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malayparida2000 the fix for using filesystem name as key for antiaffinity is even not backported, so there is already being a lot of changes

So can have this in 4.18?

And for backports we can have seperate PR in 4.17

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through the code a bit. To list & get odf labeled nodes We will need a change in the getPlacement() funcs signature as it would now need the client. This would require a lot of changes in the codebase especially the unit tests. Considering tomorrow is the RC I don't think it's achievable by tomorrow.

Copy link
Contributor

openshift-ci bot commented Feb 12, 2025

@parth-gr: changing LGTM is restricted to collaborators

In response to this:

cc @BlaineEXE

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@BlaineEXE
Copy link
Contributor

I think this is a good low-effort way to soften the issue. I think that this might not solve the full issue wholistically, but as Malay said, we can consider that for 4.19+

Copy link
Member

@parth-gr parth-gr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, @malayparida2000 please open a follow up jira bug to track the decision we made we would be doing in future.

Copy link
Contributor

openshift-ci bot commented Feb 12, 2025

@parth-gr: changing LGTM is restricted to collaborators

In response to this:

lgtm, @malayparida2000 please open a follow up jira bug to track the decision we made we would be doing in future.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@malayparida2000
Copy link
Contributor Author

@BlaineEXE Can you please approve this, Nitin is out sick.

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 12, 2025
@BlaineEXE
Copy link
Contributor

/approve
/lgtm

Copy link
Contributor

openshift-ci bot commented Feb 12, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BlaineEXE, malayparida2000, parth-gr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 9edaa53 into red-hat-storage:main Feb 12, 2025
11 checks passed
@malayparida2000
Copy link
Contributor Author

/cherry-pick release-4.18

@openshift-cherrypick-robot

@malayparida2000: new pull request created: #3036

In response to this:

/cherry-pick release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants