A few installation deployment targets are provided below.
- Ray Cluster Using Operator on OpenShift
- Ray Cluster on OpenShift
- Ray Cluster on OpenShift for Jupyter
Deploying the Ray Operator
- Access to OpenShift cluster
- Python 3.8+
We recommend installing Python 3.8.7 using pyenv.
- Install CodeFlare
Install from PyPI:
pip3 install --upgrade pip # CodeFlare requires pip >21.0
pip3 install --upgrade codeflare
Alternatively, you can also build locally with:
git clone https://github.com/project-codeflare/codeflare.git
pip3 install --upgrade pip
pip3 install .
-
Create Cluster (https://docs.ray.io/en/master/cluster/cloud.html#kubernetes)
Assuming OpenShift cluster access from pre-reqs.
a) Create namespace
``` $ oc create namespace codeflare namespace/codeflare created $ ```
b) Bring up Ray cluster
``` $ ray up ray/python/ray/autoscaler/kubernetes/example-full.yaml Cluster: default Checking Kubernetes environment settings 2021-02-09 06:40:09,612 INFO config.py:169 -- KubernetesNodeProvider: using existing namespace 'ray' 2021-02-09 06:40:09,671 INFO config.py:202 -- KubernetesNodeProvider: autoscaler_service_account 'autoscaler' not found, attempting to create it 2021-02-09 06:40:09,738 INFO config.py:204 -- KubernetesNodeProvider: successfully created autoscaler_service_account 'autoscaler' 2021-02-09 06:40:10,196 INFO config.py:228 -- KubernetesNodeProvider: autoscaler_role 'autoscaler' not found, attempting to create it 2021-02-09 06:40:10,265 INFO config.py:230 -- KubernetesNodeProvider: successfully created autoscaler_role 'autoscaler' 2021-02-09 06:40:10,573 INFO config.py:261 -- KubernetesNodeProvider: autoscaler_role_binding 'autoscaler' not found, attempting to create it 2021-02-09 06:40:10,646 INFO config.py:263 -- KubernetesNodeProvider: successfully created autoscaler_role_binding 'autoscaler' 2021-02-09 06:40:10,704 INFO config.py:294 -- KubernetesNodeProvider: service 'ray-head' not found, attempting to create it 2021-02-09 06:40:10,788 INFO config.py:296 -- KubernetesNodeProvider: successfully created service 'ray-head' 2021-02-09 06:40:11,098 INFO config.py:294 -- KubernetesNodeProvider: service 'ray-workers' not found, attempting to create it 2021-02-09 06:40:11,185 INFO config.py:296 -- KubernetesNodeProvider: successfully created service 'ray-workers' No head node found. Launching a new cluster. Confirm [y/N]: y Acquiring an up-to-date head node 2021-02-09 06:40:14,396 INFO node_provider.py:113 -- KubernetesNodeProvider: calling create_namespaced_pod (count=1). Launched a new head node Fetching the new head node <1/1> Setting up head node Prepared bootstrap config New status: waiting-for-ssh [1/7] Waiting for SSH to become available Running `uptime` as a test. 2021-02-09 06:40:15,296 INFO command_runner.py:171 -- NodeUpdater: ray-head-ql46b: Running kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)' error: unable to upgrade connection: container not found ("ray-node") SSH still not available (Exit Status 1): kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds. 2021-02-09 06:40:22,197 INFO command_runner.py:171 -- NodeUpdater: ray-head-ql46b: Running kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)' 03:41:41 up 81 days, 14:25, 0 users, load average: 1.42, 0.87, 0.63 Success. Updating cluster configuration. [hash=16487b5e0285fc46d5f1fd6da0370b2f489a6e5f] New status: syncing-files [2/7] Processing file mounts 2021-02-09 06:41:42,330 INFO command_runner.py:171 -- NodeUpdater: ray-head-ql46b: Running kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (mkdir -p ~)' [3/7] No worker file mounts to sync New status: setting-up [4/7] No initialization commands to run. [5/7] Initalizing command runner [6/7] No setup commands to run. [7/7] Starting the Ray runtime 2021-02-09 06:42:10,643 INFO command_runner.py:171 -- NodeUpdater: ray-head-ql46b: Running kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (export RAY_OVERRIDE_RESOURCES='"'"'{"CPU":1,"GPU":0}'"'"';ray stop)' Did not find any active Ray processes. 2021-02-09 06:42:13,845 INFO command_runner.py:171 -- NodeUpdater: ray-head-ql46b: Running kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (export RAY_OVERRIDE_RESOURCES='"'"'{"CPU":1,"GPU":0}'"'"';ulimit -n 65536; ray start --head --num-cpus=$MY_CPU_REQUEST --port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml --dashboard-host 0.0.0.0)' Local node IP: 172.30.236.163 2021-02-09 03:42:17,373 INFO services.py:1195 -- View the Ray dashboard at http://172.30.236.163:8265 -------------------- Ray runtime started. -------------------- Next steps To connect to this Ray runtime from another node, run ray start --address='172.30.236.163:6379' --redis-password='5241590000000000' Alternatively, use the following Python code: import ray ray.init(address='auto', _redis_password='5241590000000000') If connection fails, check your firewall settings and network configuration. To terminate the Ray runtime, run ray stop New status: up-to-date Useful commands Monitor autoscaling with ray exec /Users/darroyo/git_workspaces/github.com/ray-project/ray/python/ray/autoscaler/kubernetes/example-full.yaml 'tail -n 100 -f /tmp/ray/session_latest/logs/monitor*' Connect to a terminal on the cluster head: ray attach /Users/darroyo/git_workspaces/github.com/ray-project/ray/python/ray/autoscaler/kubernetes/example-full.yaml Get a remote shell to the cluster manually: kubectl -n ray exec -it ray-head-ql46b -- bash ```
-
Verify
a) Check for head node$ oc get pods NAME READY STATUS RESTARTS AGE ray-head-ql46b 1/1 Running 0 118m $
b) Run example test
ray submit python/ray/autoscaler/kubernetes/example-full.yaml x.py Loaded cached provider configuration If you experience issues with the cloud provider, try re-running the command with --no-config-cache. 2021-02-09 08:50:51,028 INFO command_runner.py:171 -- NodeUpdater: ray-head-ql46b: Running kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (python ~/x.py)' 2021-02-09 05:52:10,538 INFO worker.py:655 -- Connecting to existing Ray cluster at address: 172.30.236.163:6379 [0, 1, 4, 9]
The Operate First project hosts a public demonstration of ray-enabled jupyter notebooks, based on the Open Data Hub (ODH) data science platform. This free ODH environment can be accessed here, and is accessible to anyone with a gmail account via SSO login.
To install a similar Ray integration with CodeFlare onto your own Open Data Hub environment, follow the instructions on this reference repository.
The container images used in this reference demo were built from this repo, and have CodeFlare pre-installed. They include a basic "ray-ml" notebook image and a corresponding ray-ml worker-node image.
Once in a Jupyter environment, refer to notebooks for example pipelines. Documentation for reference use cases can be found in Examples.