Use Required Pod anti affinity if Active MDS is not more than 1 #3035

malayparida2000 · 2025-02-12T06:25:10Z

When Active MDS is 1, there would be 2 mds pods & we want them to be scheduled on different nodes always. So we use the required pod anti-affinity. If Active MDS are more than 1 then in a 3 node cluster scheduling issues would be there, so we switch to using preferred anti-affinity.

When Active MDS is 1, there would be 2 mds pods & we want them to be scheduled on different nodes always. So we use the required pod anti-affinity. If Active MDS are more than 1 then in a 3 node cluster scheduling issues would be there, so we switch to using preferred anti-affinity. Signed-off-by: Malay Kumar Parida <[email protected]>

malayparida2000 · 2025-02-12T09:36:15Z

Testing

When activeMetadataServers count is not more than 1

~ $ oc get cephfilesystem ocs-storagecluster-cephfilesystem -o yaml | yq '.spec.metadataServer.placement'                                                                                          
nodeAffinity:
 requiredDuringSchedulingIgnoredDuringExecution:
   nodeSelectorTerms:
     - matchExpressions:
         - key: cluster.ocs.openshift.io/openshift-storage
           operator: Exists
podAntiAffinity:
 requiredDuringSchedulingIgnoredDuringExecution:
   - labelSelector:
       matchExpressions:
         - key: rook_file_system
           operator: In
           values:
             - ocs-storagecluster-cephfilesystem
     topologyKey: topology.kubernetes.io/zone
tolerations:
 - effect: NoSchedule
   key: node.ocs.openshift.io/storage
   operator: Equal
   value: "true"

~ $ oc get pods -o wide | grep mds
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-6b65d596jrxz2   2/2     Running     0          3s    10.129.2.30   ip-10-0-67-201.ec2.internal   <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-7cdf54cd787dx   2/2     Running     0          19m   10.131.0.31   ip-10-0-1-83.ec2.internal     <none>           <none>

When activeMetadataServers count is more than 1

~ $ oc get cephfilesystem ocs-storagecluster-cephfilesystem -o yaml | yq '.spec.metadataServer.placement'                                                                                          
nodeAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
    nodeSelectorTerms:
      - matchExpressions:
          - key: cluster.ocs.openshift.io/openshift-storage
            operator: Exists
podAntiAffinity:
  preferredDuringSchedulingIgnoredDuringExecution:
    - podAffinityTerm:
        labelSelector:
          matchExpressions:
            - key: rook_file_system
              operator: In
              values:
                - ocs-storagecluster-cephfilesystem
        topologyKey: topology.kubernetes.io/zone
      weight: 100
tolerations:
  - effect: NoSchedule
    key: node.ocs.openshift.io/storage
    operator: Equal

 ~ $ oc get pods -o wide | grep mds
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-6b65d596jrxz2   2/2     Running     0          40s   10.129.2.30   ip-10-0-67-201.ec2.internal   <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5c7b9b46nxtcn   2/2     Running     0          25s   10.131.0.35   ip-10-0-1-83.ec2.internal     <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-c-669997d4k4jgr   2/2     Running     0          18s   10.128.2.55   ip-10-0-56-236.ec2.internal   <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-d-849db8c4zncpd   2/2     Running     0          17s   10.131.0.36   ip-10-0-1-83.ec2.internal     <none>           <none>

malayparida2000 · 2025-02-12T09:44:34Z

/cc @parth-gr

openshift-ci · 2025-02-12T09:44:38Z

@malayparida2000: GitHub didn't allow me to request PR reviews from the following users: parth-gr.

Note that only red-hat-storage members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @parth-gr

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

parth-gr

lgtm

cc @agarwal-mudit @iamniting @nb-ohad

openshift-ci · 2025-02-12T11:59:34Z

@parth-gr: changing LGTM is restricted to collaborators

In response to this:

lgtm

cc @agarwal-mudit @iamniting @nb-ohad

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

parth-gr

cc @BlaineEXE

parth-gr · 2025-02-12T12:06:23Z

controllers/storagecluster/placement.go

-				},
+			// if active MDS number is more than 1 then Preferred and if it is 1 then Required pod anti-affinity is set
+			mdsWeightedPodAffinity := defaults.GetMdsWeightedPodAffinityTerm(100, generateNameForCephFilesystem(sc))
+			if sc.Spec.ManagedResources.CephFilesystems.ActiveMetadataServers > 1 {


@malayparida2000 can we also list the total nodes and see if the storage nodes are more than a number of pods?

@parth-gr I didn't do that as this fix needs to be backported till 4.15 & I don't want to add client list operations in backports. We can implement the get node & decision based on that may be in 4.19.

@malayparida2000 the fix for using filesystem name as key for antiaffinity is even not backported, so there is already being a lot of changes

So can have this in 4.18?

And for backports we can have seperate PR in 4.17

Went through the code a bit. To list & get odf labeled nodes We will need a change in the getPlacement() funcs signature as it would now need the client. This would require a lot of changes in the codebase especially the unit tests. Considering tomorrow is the RC I don't think it's achievable by tomorrow.

openshift-ci · 2025-02-12T12:06:40Z

@parth-gr: changing LGTM is restricted to collaborators

In response to this:

cc @BlaineEXE

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

BlaineEXE · 2025-02-12T15:14:55Z

I think this is a good low-effort way to soften the issue. I think that this might not solve the full issue wholistically, but as Malay said, we can consider that for 4.19+

parth-gr

lgtm, @malayparida2000 please open a follow up jira bug to track the decision we made we would be doing in future.

openshift-ci · 2025-02-12T15:43:05Z

@parth-gr: changing LGTM is restricted to collaborators

In response to this:

lgtm, @malayparida2000 please open a follow up jira bug to track the decision we made we would be doing in future.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

malayparida2000 · 2025-02-12T17:17:30Z

@BlaineEXE Can you please approve this, Nitin is out sick.

BlaineEXE · 2025-02-12T17:22:48Z

/approve
/lgtm

openshift-ci · 2025-02-12T17:22:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BlaineEXE, malayparida2000, parth-gr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [BlaineEXE]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

malayparida2000 · 2025-02-12T17:26:55Z

/cherry-pick release-4.18

openshift-cherrypick-robot · 2025-02-12T17:27:36Z

@malayparida2000: new pull request created: #3036

In response to this:

/cherry-pick release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

parth-gr approved these changes Feb 12, 2025

View reviewed changes

parth-gr suggested changes Feb 12, 2025

View reviewed changes

parth-gr approved these changes Feb 12, 2025

View reviewed changes

BlaineEXE approved these changes Feb 12, 2025

View reviewed changes

openshift-ci bot assigned BlaineEXE Feb 12, 2025

openshift-ci bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 12, 2025

openshift-merge-bot bot merged commit 9edaa53 into red-hat-storage:main Feb 12, 2025
11 checks passed

openshift-cherrypick-robot mentioned this pull request Feb 12, 2025

DFBUGS-1509: [release-4.18] Use Required Pod anti affinity if Active MDS is not more than 1 #3036

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Required Pod anti affinity if Active MDS is not more than 1 #3035

Use Required Pod anti affinity if Active MDS is not more than 1 #3035

malayparida2000 commented Feb 12, 2025

malayparida2000 commented Feb 12, 2025

malayparida2000 commented Feb 12, 2025

openshift-ci bot commented Feb 12, 2025

parth-gr left a comment

openshift-ci bot commented Feb 12, 2025

parth-gr left a comment

parth-gr Feb 12, 2025

malayparida2000 Feb 12, 2025

parth-gr Feb 12, 2025 •

edited

Loading

malayparida2000 Feb 12, 2025

openshift-ci bot commented Feb 12, 2025

BlaineEXE commented Feb 12, 2025

parth-gr left a comment

openshift-ci bot commented Feb 12, 2025

malayparida2000 commented Feb 12, 2025

BlaineEXE commented Feb 12, 2025

openshift-ci bot commented Feb 12, 2025

malayparida2000 commented Feb 12, 2025

openshift-cherrypick-robot commented Feb 12, 2025

Use Required Pod anti affinity if Active MDS is not more than 1 #3035

Use Required Pod anti affinity if Active MDS is not more than 1 #3035

Conversation

malayparida2000 commented Feb 12, 2025

malayparida2000 commented Feb 12, 2025

Testing

When activeMetadataServers count is not more than 1

When activeMetadataServers count is more than 1

malayparida2000 commented Feb 12, 2025

openshift-ci bot commented Feb 12, 2025

parth-gr left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Feb 12, 2025

parth-gr left a comment

Choose a reason for hiding this comment

parth-gr Feb 12, 2025

Choose a reason for hiding this comment

malayparida2000 Feb 12, 2025

Choose a reason for hiding this comment

parth-gr Feb 12, 2025 • edited Loading

Choose a reason for hiding this comment

malayparida2000 Feb 12, 2025

Choose a reason for hiding this comment

openshift-ci bot commented Feb 12, 2025

BlaineEXE commented Feb 12, 2025

parth-gr left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Feb 12, 2025

malayparida2000 commented Feb 12, 2025

BlaineEXE commented Feb 12, 2025

openshift-ci bot commented Feb 12, 2025

malayparida2000 commented Feb 12, 2025

openshift-cherrypick-robot commented Feb 12, 2025

parth-gr Feb 12, 2025 •

edited

Loading