Skip to content

Commit

Permalink
O2-IMS Operator (#844)
Browse files Browse the repository at this point in the history
A PR creating a O2-IMS Operator, including related management code,
controllers, utilities, testing code, and documentation.

closes: #759 
closes: #757 

Update 02-03-2025: Now relies on
nephio-project/api#66
This API update is because of a previous outdated field in the CRD, to
test this before it is merged one must use the link to the CRD from that
commit in the curl step of the README.

---------

Signed-off-by: Daniel Kostecki <[email protected]>
Signed-off-by: Sagar Arora <[email protected]>
Signed-off-by: Vishwanath Jayaraman <[email protected]>
Co-authored-by: Sagar Arora <[email protected]>
Co-authored-by: Vishwanath Jayaraman <[email protected]>
  • Loading branch information
3 people authored Feb 13, 2025
1 parent 2eaacb0 commit 7c58cb4
Show file tree
Hide file tree
Showing 16 changed files with 1,562 additions and 0 deletions.
27 changes: 27 additions & 0 deletions operators/o2ims-operator/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
###########################################################################
# Copyright 2025 The Nephio Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##########################################################################

FROM python:3.12.9-alpine3.21 AS builder
COPY controllers/ /src/
COPY requirements.txt /
RUN pip install --user -r /requirements.txt --no-cache-dir
############### Target
FROM python:3.12.9-alpine3.21 AS target
COPY --from=builder /root/.local \
/src/ \
/root/.local
ENV PATH=/root/.local/bin:$PATH
CMD ["kopf", "run", "/root/.local/manager.py", "--all-namespaces"]
279 changes: 279 additions & 0 deletions operators/o2ims-operator/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,279 @@
# Nephio O-RAN O2 IMS Operator

This operator implements O-RAN O2 IMS for K8s based cloud management.

## How to start

### Development Requirements:

- Python3.11
- requirements.txt installed in development environment

### Nephio Management Cluster Requirements:

- 6 vCPU
- 10Gi RAM

## Create Development Environment

### Including Nephio mgmt Cluster

The following will create a kind cluster and install required components such as:
- Porch
- ConfigSync
- Gitea (available at `172.18.0.200:3000`)
- MetalLB and MetalLB Sandbox Environment
- CAPI
- ConfigSync and RootSync objects to create clusters

It will also configure a secret which the operator can use for development purposes (when running the operator in non-containerize environments). It creates a pod and appends the `porch-controllers` service account token and redirects it from `/var/run/secrets/kubernetes.io/serviceaccount/token` to `/tmp/porch-token`.


```bash
# Get the repository
git clone https://github.com/nephio-project/nephio.git
cd operators/o2ims-operator
# Create a virtual environment
virtualenv venv -p python3
source venv/bin/activate
# Install requirements
pip install -r requirements.txt
# Set kernel parameters (run these commands after system restart or when new VM/system is created)
sudo sysctl -w fs.inotify.max_user_watches=524288
sudo sysctl -w fs.inotify.max_user_instances=512
sudo sysctl -w kernel.keys.maxkeys=500000
sudo sysctl -w kernel.keys.maxbytes=1000000
# Run the create-cluster.sh script to create the mgmt cluster and development environment
./tests/create-cluster.sh
```

Operator CRD is can be fetched via below command, though the above cluster creation script automatically fetches and apply this CRD.

```bash
curl --create-dirs -O --output-dir ./config/crd/bases/ https://raw.githubusercontent.com/nephio-project/api/refs/heads/main/config/crd/bases/o2ims.provisioning.oran.org_provisioningrequests.yaml
```

### Existing Nephio mgmt Cluster

#### Non-containerized Development Environment

```bash
kubectl create -f tests/deployment/sa-test-pod.yaml
kubectl exec -it -n porch-system porch-sa-test -- cat /var/run/secrets/kubernetes.io/serviceaccount/token &> /tmp/porch-token
# Create the CRD from the Nephio API repo
kubectl create -f https://raw.githubusercontent.com/nephio-project/api/refs/heads/main/config/crd/bases/o2ims.provisioning.oran.org_provisioningrequests.yaml
export TOKEN=/tmp/porch-token
# Exposing the Kube proxy for development after killing previous proxy sessions
pkill kubectl
nohup kubectl proxy --port 8080 &>/dev/null &
```

#### Containerized Development Environment

Build a Docker image:

```bash
docker build -t o2ims:latest -f Dockerfile .
```

Push this image in your cluster, here we are using a `kind` cluster so we will push using the below command:

```bash
kind load docker-image o2ims:latest -n o2ims-mgmt
```

`NOTE`: `o2ims-mgmt` is the name of the kind cluster. It is good to mention cluster name if you have multiple clusters.

Deploy the O2 IMS operator:

```bash
kpt pkg get --for-deployment https://github.com/nephio-project/catalog.git
/nephio/optional/o2ims@origin/main /tmp/o2ims
kpt fn render /tmp/o2ims
kpt live init /tmp/o2ims
kpt live apply /tmp/o2ims --reconcile-timeout=15m --output=table
```

### To Start the Operator:

Note that there are some constants in manager.py that can be tuned before running the operator.

```bash
## To run in debug mode use the "--debug" flag or "-v --log-format=full"
kopf run controllers/manager.py
```

Open another terminal to provision a cluster:

```bash
kpt pkg get --for-deployment https://github.com/nephio-project/catalog.git
/nephio/optional/o2ims@origin/main /tmp/o2ims
kpt fn render /tmp/o2ims
kpt live init /tmp/o2ims
kpt live apply /tmp/o2ims --reconcile-timeout=15m --output=table
```

### Redeploying

To redeploy the cluster, or to recreate the development environment, one must delete the created cluster. The Nephio mgmt cluster will be deleted automatically when running `create-cluster.sh`, but the cluster deployed by this operator has a name in the `clusterName` field. For example, it may be `edge`, thus:

```bash
kind delete cluster -n edge
```

## Operator logic

O2IMS operator listens for ProvisioningRequest CR and once it is created it goes through different stages

1. `ProvisioningRequest validation`: The controller [provisioning_request_validation_controller.py](./controllers/provisioning_request_validation_controller.py) validates the provisioning requests. Currently it checks if the field `clusterName` and `clusterProvisioner`. At the moment only `capi` handled clusters are support
2. `ProvisioningRequest creation`: The controller [provisioning_request_controller.py](./controllers/provisioning_request_controller.py) takes care of creating the a package variant for Porch which can be applied to the cluster where porch is running. After applying package variant it waits for the cluster to be created and it follows the creation via querying `clusters.cluster.x-k8s.io` endpoint. Later we will add querying of packageRevisions also but at the moment their is a problem with querying packageRevisions because sometimes Porch is not able to process the request

Output of a **Successful workflow**:

<details>
<summary>The output is similar to:</summary>

```yaml
apiVersion: o2ims.provisioning.oran.org/v1alpha1
kind: ProvisioningRequest
metadata:
annotations:
provisioningrequests.o2ims.provisioning.oran.org/kopf-managed: "yes"
provisioningrequests.o2ims.provisioning.oran.org/last-ha-a.A3qw: |
{"spec":{"description":"Provisioning request for setting up a test kind cluster.","name":"test-env-Provisioning","templateName":"nephio-workload-cluster","templateParameters":{"clusterName":"edge","labels":{"nephio.org/region":"europe-paris-west","nephio.org/site-type":"edge"},"templateVersion":"v3.0.0"}}
provisioningrequests.o2ims.provisioning.oran.org/last-handled-configuration: |
{"spec":{"description":"Provisioning request for setting up a test kind cluster.","name":"test-env-Provisioning","templateName":"nephio-workload-cluster","templateParameters":{"clusterName":"edge","labels":{"nephio.org/region":"europe-paris-west","nephio.org/site-type":"edge"},"templateVersion":"v3.0.0"}}
creationTimestamp: "2025-01-31T13:50:46Z"
generation: 1
name: provisioning-request-sample
resourceVersion: "12122"
uid: e8377db2-5652-4bc6-9632-8ce0836c6afd
spec:
description: Provisioning request for setting up a test kind cluster.
name: test-env-Provisioning
templateName: nephio-workload-cluster
templateParameters:
clusterName: edge
labels:
nephio.org/site-type: edge
nephio.org/region: europe-paris-west
nephio.org/owner: nephio-o2ims
templateVersion: v3.0.0
status:
provisionedResourceSet:
oCloudInfrastructureResourceIds:
- cb92ece1-7272-4e01-9d5c-11e47b2e2473
oCloudNodeClusterId: 09470fe4-cff6-4362-a7d6-badc77dbf059
provisioningStatus:
provisioningMessage: Cluster resource created
provisioningState: fulfilled
provisioningUpdateTime: "2025-01-31T14:52:21Z"
```
</details>
## Unit Testing
Unit tests are contained in the `tests` directory, and are intended to test pieces of the O2IMS Operator in the `controllers` directory. Currently unit tests are not comprehensive, but provide expected coverage of core utility components.

Prior to running the tests, install the requirements:
```bash
pip3 install -r ./tests/unit_test_requirements.txt
```

To run all tests in `test_utils.py` with abridged output:
```bash
pytest ./tests/test_utils.py
```

Output:
```bash
==================================================================== test session starts ====================================================================
platform linux -- Python 3.13.0, pytest-8.3.4, pluggy-1.5.0
rootdir: /home/dkosteck/Documents/nephio/operators/o2ims-operator
collected 61 items
tests/test_utils.py ............................................................. [100%]
==================================================================== 61 passed in 0.14s =====================================================================
```

To run with verbose output (showing individual test results):
```bash
pytest -v ./tests/test_utils.py
```

## Known issues

### Porch Endpoints and Stuck Deployments

One may notice that the edge cluster is not provisioned, the provisioning request times out, or the package variant claims to be stalled (examples below). This is believed to be a bug in Porch, and so will be fixed upstream. For now a workaround has been identified.

#### O2IMS Cluster Not Present

You created the provisioning request but the cluster is not created

```bash
kind get clusters
mgmt
```

#### ProvisioningRequest Timeout

```bash
kubectl get provisioningrequest provisioning-request-sample -o yaml | grep provisioningStatus: -A 2
provisioningStatus:
provisioningMessage: Cluster resource creation failed reached timeout
provisioningState: failed
```

#### PackageVariant Stalled

The package variant created by O2IMS is stalled

```bash
$ kubectl get packagevariant provisioning-request-sample -o yaml | grep conditions: -A 5
conditions:
- lastTransitionTime: "2025-01-29T22:25:08Z"
message: all validation checks passed
reason: Valid
status: "False"
type: Stalled
```

#### Potential Solution

One may attempt to delete the PackageVariant, ProvisioningRequest, and the Porch Server. After the Porch Server is re-deployed, re-deploy the ProvisioningRequest:

```bash
## Delete the sample provisioning resource
kubectl delete packagevariant provisioning-request-sample
kubectl delete provisioningrequest provisioning-request-sample
kubectl delete pod porch-server-7c5485b96b-tk7sr -n porch-system # Get the pod name from kubectl
# Once deleted and new Porch Server is up
kubectl create -f tests/sample_provisioning_request.yaml
```

### Deletion request O2IMS cluster

This is not supported so you have to delete the cluster manually

First delete the provisioning request:

```bash
kubectl delete -f tests/sample_provisioning_request.yaml
```

Then delete the resources, replace **edge** with your cluster name and change **mgmt** cluster repository name with your cluster management cluster repository name.

```bash
kubectl delete packagevariants -l nephio.org/site-type=edge
kubectl delete packagevariants provisioning-request-sample
pkgList=$(kpt alpha rpkg get| grep edge | grep mgmt| awk '{print $1;}')
for pkg in $pkgList
do
kpt alpha rpkg propose-delete $pkg -ndefault
kpt alpha rpkg delete $pkg -ndefault
done
```
Empty file.
Loading

0 comments on commit 7c58cb4

Please sign in to comment.