Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cephfs: upgrading mount syntax #5090

Open
wants to merge 1 commit into
base: devel
Choose a base branch
from
Open

Conversation

MageekChiu
Copy link

@MageekChiu MageekChiu commented Jan 17, 2025

The old syntax is almost deprecated, and there are reasons to upgrade it to the new one

  • old syntax is lack of fsid param, which is critical for debugging and observability
  • mds_namespace is deprecated, so it might be inappropriate to continue using it
  • kernel will try new syntax first and then the old one(check with mount -v), it is a waste to use the old one

From the ubuntu manpage,20.04 LTS supports the old syntax while 22.04 LTS supports the new one.

@mergify mergify bot added the component/cephfs Issues related to CephFS label Jan 17, 2025
@nixpanic
Copy link
Member

Many thanks @MageekChiu!

Could you update the PR description with a reference that explains the changes? It would be useful to know how recent these changes are, and if there could be a problem when users have an older version deployed.

It seems the commit message contains a few long lines. Please edit it and keep them under 80 characters.

@kotreshhr, maybe you have a reference to the new/changed mount options handy?

@MageekChiu MageekChiu force-pushed the devel branch 3 times, most recently from 19eeb9a to 21d1d64 Compare January 21, 2025 10:46
@MageekChiu
Copy link
Author

@nixpanic Greate advices, thank you.
I‘ve edited the PR description and the commit message.

Cool homepage by the way, need to add one myself 😄.

nixpanic
nixpanic previously approved these changes Jan 21, 2025
Copy link
Member

@nixpanic nixpanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks!

@nixpanic nixpanic requested a review from a team January 21, 2025 12:35
@mergify mergify bot dismissed nixpanic’s stale review January 25, 2025 15:51

Pull request has been modified.

@MageekChiu MageekChiu force-pushed the devel branch 2 times, most recently from 63c4692 to 42b818b Compare January 25, 2025 15:52
@kotreshhr
Copy link

Many thanks @MageekChiu!

Could you update the PR description with a reference that explains the changes? It would be useful to know how recent these changes are, and if there could be a problem when users have an older version deployed.

It seems the commit message contains a few long lines. Please edit it and keep them under 80 characters.

@kotreshhr, maybe you have a reference to the new/changed mount options handy?

I see the old and new mount options are already linked in the PR. Yes, the kernel tries the new syntax first and falls back to old if not available. I think this should be fine but tagging the kernel developer @Markuze @vshankar to confirm if this breaks anything based on the version it got introduced.

@@ -46,6 +46,7 @@ type VolumeOptions struct {
RequestName string
NamePrefix string
ClusterID string
FsID string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of always fetching the FsID, can you add a func (vo *VolumeOptions) GetFSID() (string, error) function?

This can then be used where needed, currently only in NewKernelMounter().

By extending the NewVolumeOptions() function, it grows (too) large, and the golang-ci linter fails due to that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nixpanic,I believe you may have missed this—we already have GetFSID() defined in the ClusterConnection struct. Since ClusterConnection is a member of the VolumeOptions struct, it can be accessed via vo.conn.GetFSID().

@MageekChiu MageekChiu requested a review from nixpanic February 2, 2025 11:42
@nixpanic
Copy link
Member

nixpanic commented Feb 5, 2025

Hi @MageekChiu, the PR looks good to me. Can you squash all commits into a single one and force-push your branch? That makes it possible for the @mergify bot can do it's work and get this in soon.

Thanks!

@@ -72,24 +72,25 @@ func (m *kernelMounter) mountKernel(
m.needsModprobe = false
}

fsID, err := volOptions.GetFSID()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we use existing GetFSID() instead?

Suggested change
fsID, err := volOptions.GetFSID()
fsID, err := volOptions.GetConnection().GetFSID()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

volOptions.GetConnection() can return nil, so it is not really safe. A new GetFSID() would prevent any incorrect usage, now, and possibly in the future.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the checking is needed, Connect and Destroy do the checking too.
And I've squashed the commits.

Copy link
Contributor

@iPraveenParihar iPraveenParihar Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

volOptions.GetConnection() can return nil, so it is not really safe. A new GetFSID() would prevent any incorrect usage, now, and possibly in the future.

when volOptions struct is created, we make sure we have the connection set. There is no way GetConnection() could return nil.

IMO, GetConnection() should be handling the nil check, or the GetConnection() caller should check for nil and proceed.

as we have similar usage at some places. for example -

ioctx, err := volOptions.GetConnection().GetIoctx(volOptions.MetadataPool)
if err != nil {
log.ErrorLog(ctx, "Failed to create ioctx: %s", err)
return err
}
defer ioctx.Destroy()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, improving GetConnection() would be nice too. I don't think that needs to be done in this PR though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, improving GetConnection() would be nice too. I don't think that needs to be done in this PR though.

That can be done through issue #5129.

@iPraveenParihar
Copy link
Contributor

/test ci/centos/mini-e2e/k8s-1.32

@nixpanic
Copy link
Member

nixpanic commented Feb 7, 2025

/test ci/centos/mini-e2e/k8s-1.32

There has been a new Ceph release yesterday, building Ceph-CSI fails until the new container-image can install required RPMs (librados-devel etc...). 😢

@nixpanic
Copy link
Member

@Mergifyio rebase

@iPraveenParihar , is there anything blocking this PR from your point of view?

The old syntax is almost deprecated,and there are reasons to upgrade it
- old syntax is lack of fsid(critical for debugging and observability)
- mds_namespace is deprecated, it might be inappropriate to continue using it
- kernel will try new syntax first and then the old one, it's a waste

Signed-off-by: mageekchiu <[email protected]>
Copy link
Contributor

mergify bot commented Feb 13, 2025

rebase

✅ Branch has been successfully rebased

@nixpanic
Copy link
Member

@Mergifyio queue

Copy link
Contributor

mergify bot commented Feb 13, 2025

queue

🛑 The pull request has been removed from the queue default

The merge conditions cannot be satisfied due to failing checks.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

@mergify mergify bot added the ok-to-test Label to trigger E2E tests label Feb 13, 2025
@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.30

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-cephfs

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.30

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.30

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-rbd

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.32

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.32

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.32

@ceph-csi-bot ceph-csi-bot removed the ok-to-test Label to trigger E2E tests label Feb 13, 2025
Copy link
Contributor

mergify bot commented Feb 13, 2025

This pull request has been removed from the queue for the following reason: checks failed.

The merge conditions cannot be satisfied due to failing checks:

You should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it.

If you want to requeue this pull request, you need to post a comment with the text: @mergifyio requeue

@nixpanic
Copy link
Member

It looks like the ci/centos/mini-e2e-helm/k8s-1.30 and ci/centos/mini-e2e/k8s-1.32 CI jobs failed during check static PVC with FsName, with

  I0213 09:41:19.618304   22892 utils.go:266] ID: 100 Req-ID: pv-name GRPC call: /csi.v1.Node/NodeStageVolume
  I0213 09:41:19.618350   22892 utils.go:267] ID: 100 Req-ID: pv-name GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/a0dd44413b10ee8f81629ecf349e45d15232fdfb63323715b60e6bbdb7f50dae/globalmount","volume_capability":{"access_mode":{"mode":"SINGLE_NODE_MULTI_WRITER"},"mount":{}},"volume_context":{"clusterID":"d1738887-c6ec-4570-bd3c-85bd6e09a8ab","fsName":"myfs","rootPath":"/volumes/testGroup/testSubVol/13f7794a-ff38-40fd-a599-91863996ac33","staticVolume":"true"},"volume_id":"pv-name"}
  I0213 09:41:19.618516   22892 volumemounter.go:126] requested mounter: , chosen mounter: kernel
  I0213 09:41:19.618594   22892 nodeserver.go:358] ID: 100 Req-ID: pv-name cephfs: mounting volume pv-name with Ceph kernel client
  I0213 09:41:19.618641   22892 crushlocation.go:41] CRUSH location labels passed for processing: [topology.kubernetes.io/region topology.kubernetes.io/zone]
  I0213 09:41:19.618656   22892 crushlocation.go:73] list of CRUSH location processed: map[region:east zone:east-zone1]
  E0213 09:41:19.618671   22892 nodeserver.go:368] ID: 100 Req-ID: pv-name failed to mount volume pv-name: failed to get fsID, stop mounting: cluster not connected yet Check dmesg logs if required.
  E0213 09:41:19.618694   22892 utils.go:271] ID: 100 Req-ID: pv-name GRPC error: rpc error: code = Internal desc = failed to get fsID, stop mounting: cluster not connected yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/cephfs Issues related to CephFS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants