Skip to content

"Permission denied" error when not using the default namespace in hf_interactive notebook #129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MichaelClifford opened this issue May 24, 2023 · 5 comments · Fixed by opendatahub-io/distributed-workloads#56
Assignees

Comments

@MichaelClifford
Copy link
Collaborator

I've noticed the below permissions error while running the hf_interactive.ipynb notebook demo.

This error occurs when I deploy my RayCluster in any namespace besides the default namespace. Simply using default overcomes the permissions issue. However, this is not behavior we want to support. We need to ensure that the actions of the codeflare-sdk have the required permissions in the correct namespaces.

image
image

This issue can be partially circumvented by adding runtime_env = "env_vars": {"HF_HOME":"huggingface"} to your ray.init() command, however, it will just lead to a similar permission issue later on in training.

How can we ensure that the users have the correct permissions in their namespaces?

@KPostOffice
Copy link
Collaborator

Is this connected to #125 ?

@KPostOffice KPostOffice self-assigned this Jun 8, 2023
@KPostOffice KPostOffice moved this from Todo to In Progress in Project CodeFlare Sprint Board Jun 8, 2023
@KPostOffice
Copy link
Collaborator

KPostOffice commented Jun 9, 2023

My current thought is that this is an issue with the upstream image not being built to be compatible with running as an arbitrary user on OpenShift. I'm planning on rebuilding and pushing the image after adding something like:

RUN chgrp -R 0 /home/ray && \
    chmod -R g+rwX /home/ray

as seen here

If this works we can rebuild the images ourselves and push them to our quay org. Then we can see if Ray is open to adding this to their builds

@KPostOffice
Copy link
Collaborator

I was able to get the hf_interactive notebook working using the following Dockerfile

FROM ghcr.io/foundation-model-stack/base:ray2.1.0-py38-gpu-pytorch1.12.0cu116-20221213-193103

USER 0

RUN chgrp -R 0 /home/ray && chmod -R g+rwX /home/ray

which is available at quay.io/kpostlet/ray:2.1.0 if anyone else wants to use this for testing you can pass it to your ClusterConfiguration with the image parameter.

cluster = Cluster(ClusterConfiguration(
    ...,
    image='quay.io/kpostlet/ray:2.1.0'
))

@KPostOffice
Copy link
Collaborator

This was added but reverted in the ray project. See ray-project/ray#32025 for discussion

@KPostOffice
Copy link
Collaborator

My current ideas are that we can either:

  1. host a custom image with the correct permissions in the home directory
  2. run pods with ray user (I think this is UID=1000)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.

2 participants