CodeFlare on OpenShift Container Platform (OCP)

A few installation deployment targets are provided below.

Ray Cluster Using Operator on OpenShift
Ray Cluster on OpenShift
Ray Cluster on OpenShift for Jupyter

OpenShift Ray Cluster Operator

Deploying the Ray Operator

OpenShift Cluster

Dispatch Ray Cluster on OpenShift

Pre-req

Access to OpenShift cluster
Python 3.8+

We recommend installing Python 3.8.7 using pyenv.

Setup

Install CodeFlare

Install from PyPI:

pip3 install --upgrade pip          # CodeFlare requires pip >21.0
pip3 install --upgrade codeflare

Alternatively, you can also build locally with:

git clone https://github.com/project-codeflare/codeflare.git
pip3 install --upgrade pip
pip3 install .

Create Cluster (https://docs.ray.io/en/master/cluster/cloud.html#kubernetes)

Assuming OpenShift cluster access from pre-reqs.

a) Create namespace

```
$ oc create namespace codeflare
namespace/codeflare created
$
```

b) Bring up Ray cluster

 ```
 $ ray up ray/python/ray/autoscaler/kubernetes/example-full.yaml
 Cluster: default

 Checking Kubernetes environment settings
 2021-02-09 06:40:09,612	INFO config.py:169 -- KubernetesNodeProvider: using existing namespace 'ray'
 2021-02-09 06:40:09,671	INFO config.py:202 -- KubernetesNodeProvider: autoscaler_service_account 'autoscaler' not found, attempting to create it
 2021-02-09 06:40:09,738	INFO config.py:204 -- KubernetesNodeProvider: successfully created autoscaler_service_account 'autoscaler'
 2021-02-09 06:40:10,196	INFO config.py:228 -- KubernetesNodeProvider: autoscaler_role 'autoscaler' not found, attempting to create it
 2021-02-09 06:40:10,265	INFO config.py:230 -- KubernetesNodeProvider: successfully created autoscaler_role 'autoscaler'
 2021-02-09 06:40:10,573	INFO config.py:261 -- KubernetesNodeProvider: autoscaler_role_binding 'autoscaler' not found, attempting to create it
 2021-02-09 06:40:10,646	INFO config.py:263 -- KubernetesNodeProvider: successfully created autoscaler_role_binding 'autoscaler'
 2021-02-09 06:40:10,704	INFO config.py:294 -- KubernetesNodeProvider: service 'ray-head' not found, attempting to create it
 2021-02-09 06:40:10,788	INFO config.py:296 -- KubernetesNodeProvider: successfully created service 'ray-head'
 2021-02-09 06:40:11,098	INFO config.py:294 -- KubernetesNodeProvider: service 'ray-workers' not found, attempting to create it
 2021-02-09 06:40:11,185	INFO config.py:296 -- KubernetesNodeProvider: successfully created service 'ray-workers'
 No head node found. Launching a new cluster. Confirm [y/N]: y

 Acquiring an up-to-date head node
 2021-02-09 06:40:14,396	INFO node_provider.py:113 -- KubernetesNodeProvider: calling create_namespaced_pod (count=1).
   Launched a new head node
   Fetching the new head node

 <1/1> Setting up head node
   Prepared bootstrap config
   New status: waiting-for-ssh
   [1/7] Waiting for SSH to become available
     Running `uptime` as a test.
 2021-02-09 06:40:15,296	INFO command_runner.py:171 -- NodeUpdater: ray-head-ql46b: Running kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
 error: unable to upgrade connection: container not found ("ray-node")
     SSH still not available (Exit Status 1): kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
 2021-02-09 06:40:22,197	INFO command_runner.py:171 -- NodeUpdater: ray-head-ql46b: Running kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'

 03:41:41 up 81 days, 14:25,  0 users,  load average: 1.42, 0.87, 0.63
     Success.
   Updating cluster configuration. [hash=16487b5e0285fc46d5f1fd6da0370b2f489a6e5f]
   New status: syncing-files
   [2/7] Processing file mounts
 2021-02-09 06:41:42,330	INFO command_runner.py:171 -- NodeUpdater: ray-head-ql46b: Running kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (mkdir -p ~)'
   [3/7] No worker file mounts to sync
   New status: setting-up
   [4/7] No initialization commands to run.
   [5/7] Initalizing command runner
   [6/7] No setup commands to run.
   [7/7] Starting the Ray runtime
 2021-02-09 06:42:10,643	INFO command_runner.py:171 -- NodeUpdater: ray-head-ql46b: Running kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (export RAY_OVERRIDE_RESOURCES='"'"'{"CPU":1,"GPU":0}'"'"';ray stop)'
 Did not find any active Ray processes.
 2021-02-09 06:42:13,845	INFO command_runner.py:171 -- NodeUpdater: ray-head-ql46b: Running kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (export RAY_OVERRIDE_RESOURCES='"'"'{"CPU":1,"GPU":0}'"'"';ulimit -n 65536; ray start --head --num-cpus=$MY_CPU_REQUEST --port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml --dashboard-host 0.0.0.0)'
 Local node IP: 172.30.236.163
 2021-02-09 03:42:17,373	INFO services.py:1195 -- View the Ray dashboard at http://172.30.236.163:8265

 --------------------
 Ray runtime started.
 --------------------

 Next steps
   To connect to this Ray runtime from another node, run
     ray start --address='172.30.236.163:6379' --redis-password='5241590000000000'

   Alternatively, use the following Python code:
     import ray
     ray.init(address='auto', _redis_password='5241590000000000')

   If connection fails, check your firewall settings and network configuration.

   To terminate the Ray runtime, run
     ray stop
   New status: up-to-date

 Useful commands
   Monitor autoscaling with
     ray exec /Users/darroyo/git_workspaces/github.com/ray-project/ray/python/ray/autoscaler/kubernetes/example-full.yaml 'tail -n 100 -f /tmp/ray/session_latest/logs/monitor*'
   Connect to a terminal on the cluster head:
     ray attach /Users/darroyo/git_workspaces/github.com/ray-project/ray/python/ray/autoscaler/kubernetes/example-full.yaml
   Get a remote shell to the cluster manually:
     kubectl -n ray exec -it ray-head-ql46b -- bash
 ```

Verify
a) Check for head node

$ oc get pods
NAME             READY   STATUS    RESTARTS   AGE
ray-head-ql46b   1/1     Running   0          118m
$

b) Run example test

ray submit python/ray/autoscaler/kubernetes/example-full.yaml x.py 
Loaded cached provider configuration
If you experience issues with the cloud provider, try re-running the command with --no-config-cache.
2021-02-09 08:50:51,028	INFO command_runner.py:171 -- NodeUpdater: ray-head-ql46b: Running kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (python ~/x.py)'
2021-02-09 05:52:10,538	INFO worker.py:655 -- Connecting to existing Ray cluster at address: 172.30.236.163:6379
[0, 1, 4, 9]

Ray with Open Data Hub on OpenShift

The Operate First project hosts a public demonstration of ray-enabled jupyter notebooks, based on the Open Data Hub (ODH) data science platform. This free ODH environment can be accessed here, and is accessible to anyone with a gmail account via SSO login.

To install a similar Ray integration with CodeFlare onto your own Open Data Hub environment, follow the instructions on this reference repository.

The container images used in this reference demo were built from this repo, and have CodeFlare pre-installed. They include a basic "ray-ml" notebook image and a corresponding ray-ml worker-node image.

Running examples

Once in a Jupyter environment, refer to notebooks for example pipelines. Documentation for reference use cases can be found in Examples.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CodeFlare on OpenShift Container Platform (OCP)

OpenShift Ray Cluster Operator

OpenShift Cluster

Dispatch Ray Cluster on OpenShift

Pre-req

Setup

Ray with Open Data Hub on OpenShift

Running examples

Files

README.md

Latest commit

History

README.md

File metadata and controls

CodeFlare on OpenShift Container Platform (OCP)

OpenShift Ray Cluster Operator

OpenShift Cluster

Dispatch Ray Cluster on OpenShift

Pre-req

Setup

Ray with Open Data Hub on OpenShift

Running examples