OMap generator fails on PVCs created in a RADOS namespace #5140

SkalaNetworks · 2025-02-12T09:50:10Z

Describe the bug

When setting up mirroring, one of the requirement is enabling the OMap generator sidecar. This sidecar boots in the provisionner pod and tries to generate the omap of each PVC it detects. If one of these PVCs was to be created in a RADOS namespace within a pool, the OMap generator loops on it with error messages.

The capabilities needed to try to make it work seem also quite broad, cancelling the benefit of RADOS namespaces (rook/rook#15277)

Environment details

Image/version of Ceph CSI driver : quay.io/cephcsi/cephcsi:v3.13.0
Helm chart version : Deployed by Rook using version 1.16.3 for both the operator chart and the cluster chart
Kernel version : Talos 1.9 (Linux 6.12.5)
Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its
krbd or rbd-nbd) : krbd I guess?
Kubernetes cluster version : v1.32.0 (Talos 1.9)
Ceph cluster version : 19.2.0 (16063ff2022298c9300e49a547a16ffda59baf13) squid (stable)

Steps to reproduce

Steps to reproduce the behavior:

Deploy a Ceph/Rook cluster using the Helm Chart and the following values

Operator:

        logLevel: DEBUG
        useOperatorHostNetwork: true
        csi:
          # Necessary for volume replication and mirroring
          csiAddons:
            enabled: true
          # Necessary for volume replication and mirroring
          enableOMAPGenerator: true
          serviceMonitor:
            enabled: true
        monitoring:
          enabled: true
        # Discover new disks to show them on the dashboard
        enableDiscoveryDaemon: true
        discoveryDaemonInterval: 5m

Cluster:

        operatorNamespace: rook-system
        clusterName: "somethingsomething"
        toolbox:
          enabled: true
        monitoring:
          enabled: true

        ingress:
          dashboard:
            annotations:
              cert-manager.io/cluster-issuer: letsencrypt
            host:
              name: "something"
              path: "/"
            tls:
              - hosts:
                  - "something"
                secretName: ceph-dashboard-tls
            ingressClassName: something

        cephClusterSpec:
          dashboard:
            enabled: true
            ssl: false
            prometheusEndpoint: "http://prometheus-kube-prometheus-prometheus.prometheus-system:9090"
            prometheusEndpointSSLVerify: false

          mgr:
            modules:
              - name: rook
                enabled: true

          storage:
            useAllDevices: false
            deviceFilter: ""

          crashCollector:
            disable: false
            daysToRetain: 365

          network:
            # Use the underlying host network for performance and to expose the daemons to remote Ceph clients
            provider: host

            # We run Ceph only on IPv6 networks, bind the daemons on IPv6 addresses
            ipFamily: IPv6

            # This needs to be overriden for each cluster with the IPv6 ranges dedicated to handling storage traffic
            addressRanges:
              public:
                - xxxx
              cluster:
                - xxxx

        cephFileSystems: []
        cephObjectStores: []
        cephBlockPools:
        # Block storage to be used by remote SPX clusters
        # Device type: NVMe
        # Replication: 3x
        - name: rbd-nvme3x
          spec:
            failureDomain: host
            enableRBDStats: true
            replicated:
              size: 3
            deviceClass: nvme
            mirroring:
              enabled: true
              mode: image
              peers:
                secretNames:
                - rbd-primary-site-secret
          storageClass:
            enabled: false

Create the storageclass we'll use to trigger the bug, the peering relation with another cluster and the rados namespace associated with the storageclass

---
apiVersion: ceph.rook.io/v1
kind: CephBlockPoolRadosNamespace
metadata:
  name: test-mirroring
  namespace: spx-storage
spec:
  blockPoolName: rbd-nvme3x

---
# This is used to peer to another Ceph cluster, probably irrelevant to this bug
apiVersion: v1
data:
  token: xxxx
kind: Secret
metadata:
  name: rbd-primary-site-secret
  namespace: spx-storage

---
apiVersion: ceph.rook.io/v1
kind: CephRBDMirror
metadata:
  name: rbd-mirror
  namespace: spx-storage
spec:
  count: 1

---
apiVersion: ceph.rook.io/v1
kind: CephClient
metadata:
  name: test-mirroring
  namespace: spx-storage
spec:
  caps: # Note the capabilities only allow for accessing the namespace, this is because the storage in that namespace will be consumed by customer K8s clusters that should not see the storage of others in the pool
    mon: "profile rbd"
    mgr: "profile rbd pool=rbd-nvme3x namespace=test-mirroring"
    osd: "profile rbd pool=rbd-nvme3x namespace=test-mirroring"

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: test-mirroring
provisioner: rook-system.rbd.csi.ceph.com
parameters:
  clusterID: 64048cf2257334960538a62d68cdcd55 # This was found by looking at the previously looking RADOS namespace and extracting the ID
  pool: rbd-nvme3x
  imageFormat: "2"
  imageFeatures: layering,fast-diff,object-map,deep-flatten,exclusive-lock
  csi.storage.k8s.io/provisioner-secret-name: "rook-ceph-client-test-mirroring"
  csi.storage.k8s.io/provisioner-secret-namespace: spx-storage
  csi.storage.k8s.io/controller-expand-secret-name:  "rook-ceph-client-test-mirroring"
  csi.storage.k8s.io/controller-expand-secret-namespace: spx-storage
  csi.storage.k8s.io/node-stage-secret-name: "rook-ceph-client-test-mirroring"
  csi.storage.k8s.io/node-stage-secret-namespace: spx-storage
  csi.storage.k8s.io/fstype: ext4
allowVolumeExpansion: true
reclaimPolicy: Delete

We now have a storageclass to deploy PVCs of type RBD inside of a RADOS ns of name "test-mirroring" in pool "rbd-nvme3x"
Create a PVC and mount it

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-mirroring
  namespace: default
spec:
  storageClassName: "test-mirroring"
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi

---
apiVersion: v1
kind: Pod
metadata:
  name: debug
  namespace: default
spec:
  containers:
  - name: debug-container
    image: busybox:latest
    command: ["sh", "-c", "while true; do sleep 3600; done"]
    volumeMounts:
    - name: storage-volume
      mountPath: /mnt/storage
  volumes:
  - name: storage-volume
    persistentVolumeClaim:
      claimName: test-mirroring

PVC works, we can write in it, everything seems ok
Look at the csi-provisionner elected as leader, it will have the following errors
Apparently, the steps to create the OMap are realized using the capabilities of the account that is linked to the storage class. Let's give that account the rights to the pool and not the namespace to try to debug this (this is bad because it renders RADOS namespaces useless if customers have access to a Ceph account with capabilities to the pool, and not just their namespace)

8. Restart the provisionner to relaunch the Omap generation
9. Now the logs are different, we managed to pass the operation where our capabilities weren't sufficient, but now it fails to list the RBD image, as if it was trying to search for it in the POOL and not in the NAMESPACE in that pool

Actual results

OMap isn't generated for the PVC, this will probably render mirroring impossible.

Expected behavior

We can create OMaps for volumes inside a RADOS namespace (and hopefully with capabilities for that namespace only)

Additional context

Full GitOps setup on Talos + Rook

The text was updated successfully, but these errors were encountered:

Madhu-1 · 2025-02-12T09:52:54Z

This should have fixed in #5099. @iPraveenParihar can you please check if its the same and confirm?

iPraveenParihar · 2025-02-12T10:01:11Z

Yes, you are right @Madhu-1. The reported issue is fixed by #5099 and back ported (#5100) to v3.13.
@SkalaNetworks, the fix will be available in v3.13.1.

SkalaNetworks · 2025-02-12T11:26:55Z

Thanks @iPraveenParihar, do you have an ETA on the release? Is there a "latest" tag I can try to deploy the dev version and see if it fixes the problem? I still have an unknown concerning the ceph capabilities which block the creation of the OMap in the pool if you only allow the user to access its RADOS namespace and I'd like to verify that.

iPraveenParihar · 2025-02-12T11:38:48Z

Thanks @iPraveenParihar, do you have an ETA on the release?

@Rakshith-R, do we know when is the v3.13.1 release?

Is there a "latest" tag I can try to deploy the dev version and see if it fixes the problem?

You can use the canary image tag.

I still have an unknown concerning the ceph capabilities which block the creation of the OMap in the pool if you only allow the user to access its RADOS namespace and I'd like to verify that.

profile rbd pool=rbd-nvme3x namespace=test-mirroring osd capability should just work fine.

SkalaNetworks · 2025-02-12T12:28:34Z

Right, I tested the canary version (only on the csi-omap-generator container) and there's no errors visible anymore

Switching to debug mode (-v=5):

Without the "namespace=test-mirroring" capability restriction:

I0212 12:24:43.327132       1 omap.go:89] got omap values: (pool="rbd-nvme3x", namespace="test-mirroring", name="csi.volumes.default"): map[csi.volume.pvc-f195e544-10e4-4c36-99bb-c2ed334413e7:37836ab6-5a83-4ffe-986
c-01e26a478eb6]

With the restriction:

I0212 12:26:49.754382       1 omap.go:89] got omap values: (pool="rbd-nvme3x", namespace="test-mirroring", name="csi.volumes.default"): map[csi.volume.pvc-f195e544-10e4-4c36-99bb-c2ed334413e7:37836ab6-5a83-4ffe-986
c-01e26a478eb6]

I guess that's a win?

iPraveenParihar · 2025-02-13T08:30:14Z

Yes, @SkalaNetworks that correct 👍 .

If there is nothing more on this issue, feel free to close the issue 😄 .

SkalaNetworks · 2025-02-13T08:32:44Z

Thanks, I'll wait for the release of the 3.13.1 to deploy the fix properly!

SkalaNetworks closed this as completed Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OMap generator fails on PVCs created in a RADOS namespace #5140

OMap generator fails on PVCs created in a RADOS namespace #5140

SkalaNetworks commented Feb 12, 2025

Madhu-1 commented Feb 12, 2025

iPraveenParihar commented Feb 12, 2025

SkalaNetworks commented Feb 12, 2025

iPraveenParihar commented Feb 12, 2025 •

edited

Loading

SkalaNetworks commented Feb 12, 2025

iPraveenParihar commented Feb 13, 2025 •

edited

Loading

SkalaNetworks commented Feb 13, 2025 •

edited

Loading

OMap generator fails on PVCs created in a RADOS namespace #5140

OMap generator fails on PVCs created in a RADOS namespace #5140

Comments

SkalaNetworks commented Feb 12, 2025

Describe the bug

Environment details

Steps to reproduce

Actual results

Expected behavior

Additional context

Madhu-1 commented Feb 12, 2025

iPraveenParihar commented Feb 12, 2025

SkalaNetworks commented Feb 12, 2025

iPraveenParihar commented Feb 12, 2025 • edited Loading

SkalaNetworks commented Feb 12, 2025

iPraveenParihar commented Feb 13, 2025 • edited Loading

SkalaNetworks commented Feb 13, 2025 • edited Loading

iPraveenParihar commented Feb 12, 2025 •

edited

Loading

iPraveenParihar commented Feb 13, 2025 •

edited

Loading

SkalaNetworks commented Feb 13, 2025 •

edited

Loading