|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "Debugging OpenStack Tempest Pods in OpenShift Series: Failure to Import a 997MB qcow2 Image" |
| 4 | +date: 2025-06-10 19:15:00 +0300 |
| 5 | +description: "The issue was with a 997MB qcow2 image that would not import properly. Here's how I solved this step by step." |
| 6 | +tags: [OpenShift, tempest pod] |
| 7 | +categories: [OpenShift] |
| 8 | +--- |
| 9 | + |
| 10 | + |
| 11 | +I was running OpenStack Tempest tests in my OpenShift cluster when I hit a problem. The pod `tempest-tests-tempest-workflow-step-00-multi-thread-testing` failed during image import. The issue was with a 997MB qcow2 image that would not import properly. Here's how I solved this step by step. |
| 12 | + |
| 13 | +## The Problem |
| 14 | + |
| 15 | +My Tempest pod was failing during the image creation phase. The logs showed it was stuck trying to import a 997MB qcow2 image, and it kept timing out after 300 seconds. This is a common problem when working with large images. |
| 16 | + |
| 17 | +```bash |
| 18 | +Current status: importing. Waiting for image to become active... |
| 19 | +``` |
| 20 | + |
| 21 | +This message was just looping forever until the pod gave up and died. |
| 22 | + |
| 23 | +## Step 1: You Cannot Exec Into a Failed Pod |
| 24 | + |
| 25 | +First thing - you cannot exec into a failed pod: |
| 26 | + |
| 27 | +```bash |
| 28 | +oc exec -it tempest-tests-tempest-workflow-step-00-multi-thread-testing -- /bin/bash |
| 29 | +``` |
| 30 | + |
| 31 | +And got this error message: |
| 32 | +``` |
| 33 | +error: cannot exec into a container in a completed pod; current phase is Failed |
| 34 | +``` |
| 35 | + |
| 36 | +This makes sense. Failed pods do not respond to commands. |
| 37 | + |
| 38 | +## Step 2: Check the Pod Logs and Events |
| 39 | + |
| 40 | +Before doing anything else, I needed to understand what happened: |
| 41 | + |
| 42 | +```bash |
| 43 | +# Get the pod logs |
| 44 | +oc logs tempest-tests-tempest-workflow-step-00-multi-thread-testing |
| 45 | + |
| 46 | +# Get detailed information about the pod |
| 47 | +oc describe pod tempest-tests-tempest-workflow-step-00-multi-thread-testing |
| 48 | + |
| 49 | +# Check events related to this pod |
| 50 | +oc get events --field-selector involvedObject.name=tempest-tests-tempest-workflow-step-00-multi-thread-testing |
| 51 | +``` |
| 52 | + |
| 53 | +The logs showed me that the script was downloading a nearly 1GB image and then trying to import it into OpenStack Glance, but it was timing out during the import phase. |
| 54 | + |
| 55 | +## Step 3: Handle OpenShift Security Policies |
| 56 | + |
| 57 | +Now I needed to create a debug pod. OpenShift has strict security policies. I tried the simple approach: |
| 58 | + |
| 59 | +```bash |
| 60 | +oc run debug-tempest --image=tempest-tests-tempest-workflow-step-00-multi-thread-testing --rm -it -- /bin/bash |
| 61 | +``` |
| 62 | + |
| 63 | +And got multiple security violations: |
| 64 | +``` |
| 65 | +Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false... |
| 66 | +``` |
| 67 | + |
| 68 | +I needed to follow OpenShift's security requirements. |
| 69 | + |
| 70 | +## Step 4: Create a Security-Compliant Debug Pod |
| 71 | + |
| 72 | +I created a proper YAML manifest that meets OpenShift's security policies: |
| 73 | + |
| 74 | +```yaml |
| 75 | +apiVersion: v1 |
| 76 | +kind: Pod |
| 77 | +metadata: |
| 78 | + name: debug-tempest |
| 79 | +spec: |
| 80 | + restartPolicy: Never |
| 81 | + securityContext: |
| 82 | + runAsNonRoot: true |
| 83 | + runAsUser: 1001 |
| 84 | + runAsGroup: 1001 |
| 85 | + fsGroup: 1001 |
| 86 | + seccompProfile: |
| 87 | + type: RuntimeDefault |
| 88 | + containers: |
| 89 | + - name: debug-tempest |
| 90 | + image: YOUR_ACTUAL_IMAGE_HERE # Get this from the failed pod |
| 91 | + command: ["/bin/bash"] |
| 92 | + args: ["-c", "sleep 3600"] |
| 93 | + securityContext: |
| 94 | + allowPrivilegeEscalation: false |
| 95 | + capabilities: |
| 96 | + drop: |
| 97 | + - ALL |
| 98 | + runAsNonRoot: true |
| 99 | + runAsUser: 1001 |
| 100 | + runAsGroup: 1001 |
| 101 | + seccompProfile: |
| 102 | + type: RuntimeDefault |
| 103 | + env: |
| 104 | + - name: HOME |
| 105 | + value: /tmp |
| 106 | + - name: OS_CLOUD |
| 107 | + value: default |
| 108 | + volumeMounts: |
| 109 | + - name: temp-dir |
| 110 | + mountPath: /tmp |
| 111 | + - name: var-lib-tempest |
| 112 | + mountPath: /var/lib/tempest |
| 113 | + volumes: |
| 114 | + - name: temp-dir |
| 115 | + emptyDir: {} |
| 116 | + - name: var-lib-tempest |
| 117 | + emptyDir: {} |
| 118 | +``` |
| 119 | +
|
| 120 | +To get the actual image name, I ran: |
| 121 | +```bash |
| 122 | +oc get job tempest-tests-tempest-workflow-step-00-multi-thread-testing -o jsonpath='{.spec.template.spec.containers[0].image}' |
| 123 | +``` |
| 124 | + |
| 125 | +## Step 5: Alternative One-Line Command |
| 126 | + |
| 127 | +If you prefer not to use YAML files, here is a one-line command: |
| 128 | + |
| 129 | +```bash |
| 130 | +oc run debug-tempest --image=$(oc get job tempest-tests-tempest-workflow-step-00-multi-thread-testing -o jsonpath='{.spec.template.spec.containers[0].image}') --rm -it --restart=Never --overrides=' |
| 131 | +{ |
| 132 | + "spec": { |
| 133 | + "securityContext": { |
| 134 | + "runAsNonRoot": true, |
| 135 | + "runAsUser": 1001, |
| 136 | + "seccompProfile": {"type": "RuntimeDefault"} |
| 137 | + }, |
| 138 | + "containers": [{ |
| 139 | + "name": "debug-tempest", |
| 140 | + "image": "'$(oc get job tempest-tests-tempest-workflow-step-00-multi-thread-testing -o jsonpath='{.spec.template.spec.containers[0].image}')'", |
| 141 | + "command": ["/bin/bash"], |
| 142 | + "securityContext": { |
| 143 | + "allowPrivilegeEscalation": false, |
| 144 | + "capabilities": {"drop": ["ALL"]}, |
| 145 | + "runAsNonRoot": true, |
| 146 | + "runAsUser": 1001, |
| 147 | + "seccompProfile": {"type": "RuntimeDefault"} |
| 148 | + } |
| 149 | + }] |
| 150 | + } |
| 151 | +}' -- /bin/bash |
| 152 | +``` |
| 153 | + |
| 154 | +## Step 6: Debug the Actual Issue |
| 155 | + |
| 156 | +Once I got into the debug pod, I could start investigating: |
| 157 | + |
| 158 | +```bash |
| 159 | +# Test if the image URL is accessible |
| 160 | +curl -I http://a.b.c.d/dfg-network/custom_neutron_guest_rhel_8.4.qcow2 |
| 161 | + |
| 162 | +# Check OpenStack connectivity |
| 163 | +. openstackrc |
| 164 | +openstack image list |
| 165 | + |
| 166 | +# See if the image got partially created |
| 167 | +openstack image show 11111111-1111-1111-1111-111111111111 |
| 168 | +``` |
| 169 | + |
| 170 | +## What I Discovered |
| 171 | + |
| 172 | +In my case, I discovered that the **Timeout was too short**. A 997MB image needs more than 5 minutes to import on our storage backend. |
| 173 | +Check how much time it takes and add 20% spear time. |
| 174 | + |
| 175 | + |
| 176 | +```bash |
| 177 | +# Manual image upload |
| 178 | +time openstack image create \ |
| 179 | + --disk-format qcow2 \ |
| 180 | + --id 11111111-1111-1111-1111-111111111111 \ |
| 181 | + --file custom_neutron_guest_rhel_8.4.qcow2 \ |
| 182 | + --public \ |
| 183 | + custom_neutron_guest_rhel_8.4 |
| 184 | +``` |
| 185 | + |
| 186 | +## Lessons Learned |
| 187 | + |
| 188 | +1. **OpenShift security is not optional** - Learn to work with it, not against it |
| 189 | +2. **Image imports can be slow** - Plan your timeouts accordingly |
| 190 | +3. **oc debug is your friend** - When it works with the security context |
| 191 | + |
| 192 | +## Bonus Tip |
| 193 | + |
| 194 | +If you're running into similar issues regularly, consider pre-loading your test images during cluster setup rather than during test execution. Your future self will thank you when tests aren't timing out at 2 AM. |
| 195 | + |
| 196 | +Remember, debugging in OpenShift is like debugging anywhere else, except with more YAML and more security warnings. But hey, at least the error messages are usually pretty clear about what's wrong! |
| 197 | + |
| 198 | +Happy debugging, and may your pods always run to completion! |
0 commit comments