Skip to content

Commit 3e692ff

Browse files
committed
[post] Debugging OpenStack Tempest Pods in OpenShift Series: Failure to Import a 997MB qcow2 Image
1 parent 96c1706 commit 3e692ff

File tree

1 file changed

+198
-0
lines changed

1 file changed

+198
-0
lines changed
Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
---
2+
layout: post
3+
title: "Debugging OpenStack Tempest Pods in OpenShift Series: Failure to Import a 997MB qcow2 Image"
4+
date: 2025-06-10 19:15:00 +0300
5+
description: "The issue was with a 997MB qcow2 image that would not import properly. Here's how I solved this step by step."
6+
tags: [OpenShift, tempest pod]
7+
categories: [OpenShift]
8+
---
9+
10+
11+
I was running OpenStack Tempest tests in my OpenShift cluster when I hit a problem. The pod `tempest-tests-tempest-workflow-step-00-multi-thread-testing` failed during image import. The issue was with a 997MB qcow2 image that would not import properly. Here's how I solved this step by step.
12+
13+
## The Problem
14+
15+
My Tempest pod was failing during the image creation phase. The logs showed it was stuck trying to import a 997MB qcow2 image, and it kept timing out after 300 seconds. This is a common problem when working with large images.
16+
17+
```bash
18+
Current status: importing. Waiting for image to become active...
19+
```
20+
21+
This message was just looping forever until the pod gave up and died.
22+
23+
## Step 1: You Cannot Exec Into a Failed Pod
24+
25+
First thing - you cannot exec into a failed pod:
26+
27+
```bash
28+
oc exec -it tempest-tests-tempest-workflow-step-00-multi-thread-testing -- /bin/bash
29+
```
30+
31+
And got this error message:
32+
```
33+
error: cannot exec into a container in a completed pod; current phase is Failed
34+
```
35+
36+
This makes sense. Failed pods do not respond to commands.
37+
38+
## Step 2: Check the Pod Logs and Events
39+
40+
Before doing anything else, I needed to understand what happened:
41+
42+
```bash
43+
# Get the pod logs
44+
oc logs tempest-tests-tempest-workflow-step-00-multi-thread-testing
45+
46+
# Get detailed information about the pod
47+
oc describe pod tempest-tests-tempest-workflow-step-00-multi-thread-testing
48+
49+
# Check events related to this pod
50+
oc get events --field-selector involvedObject.name=tempest-tests-tempest-workflow-step-00-multi-thread-testing
51+
```
52+
53+
The logs showed me that the script was downloading a nearly 1GB image and then trying to import it into OpenStack Glance, but it was timing out during the import phase.
54+
55+
## Step 3: Handle OpenShift Security Policies
56+
57+
Now I needed to create a debug pod. OpenShift has strict security policies. I tried the simple approach:
58+
59+
```bash
60+
oc run debug-tempest --image=tempest-tests-tempest-workflow-step-00-multi-thread-testing --rm -it -- /bin/bash
61+
```
62+
63+
And got multiple security violations:
64+
```
65+
Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false...
66+
```
67+
68+
I needed to follow OpenShift's security requirements.
69+
70+
## Step 4: Create a Security-Compliant Debug Pod
71+
72+
I created a proper YAML manifest that meets OpenShift's security policies:
73+
74+
```yaml
75+
apiVersion: v1
76+
kind: Pod
77+
metadata:
78+
name: debug-tempest
79+
spec:
80+
restartPolicy: Never
81+
securityContext:
82+
runAsNonRoot: true
83+
runAsUser: 1001
84+
runAsGroup: 1001
85+
fsGroup: 1001
86+
seccompProfile:
87+
type: RuntimeDefault
88+
containers:
89+
- name: debug-tempest
90+
image: YOUR_ACTUAL_IMAGE_HERE # Get this from the failed pod
91+
command: ["/bin/bash"]
92+
args: ["-c", "sleep 3600"]
93+
securityContext:
94+
allowPrivilegeEscalation: false
95+
capabilities:
96+
drop:
97+
- ALL
98+
runAsNonRoot: true
99+
runAsUser: 1001
100+
runAsGroup: 1001
101+
seccompProfile:
102+
type: RuntimeDefault
103+
env:
104+
- name: HOME
105+
value: /tmp
106+
- name: OS_CLOUD
107+
value: default
108+
volumeMounts:
109+
- name: temp-dir
110+
mountPath: /tmp
111+
- name: var-lib-tempest
112+
mountPath: /var/lib/tempest
113+
volumes:
114+
- name: temp-dir
115+
emptyDir: {}
116+
- name: var-lib-tempest
117+
emptyDir: {}
118+
```
119+
120+
To get the actual image name, I ran:
121+
```bash
122+
oc get job tempest-tests-tempest-workflow-step-00-multi-thread-testing -o jsonpath='{.spec.template.spec.containers[0].image}'
123+
```
124+
125+
## Step 5: Alternative One-Line Command
126+
127+
If you prefer not to use YAML files, here is a one-line command:
128+
129+
```bash
130+
oc run debug-tempest --image=$(oc get job tempest-tests-tempest-workflow-step-00-multi-thread-testing -o jsonpath='{.spec.template.spec.containers[0].image}') --rm -it --restart=Never --overrides='
131+
{
132+
"spec": {
133+
"securityContext": {
134+
"runAsNonRoot": true,
135+
"runAsUser": 1001,
136+
"seccompProfile": {"type": "RuntimeDefault"}
137+
},
138+
"containers": [{
139+
"name": "debug-tempest",
140+
"image": "'$(oc get job tempest-tests-tempest-workflow-step-00-multi-thread-testing -o jsonpath='{.spec.template.spec.containers[0].image}')'",
141+
"command": ["/bin/bash"],
142+
"securityContext": {
143+
"allowPrivilegeEscalation": false,
144+
"capabilities": {"drop": ["ALL"]},
145+
"runAsNonRoot": true,
146+
"runAsUser": 1001,
147+
"seccompProfile": {"type": "RuntimeDefault"}
148+
}
149+
}]
150+
}
151+
}' -- /bin/bash
152+
```
153+
154+
## Step 6: Debug the Actual Issue
155+
156+
Once I got into the debug pod, I could start investigating:
157+
158+
```bash
159+
# Test if the image URL is accessible
160+
curl -I http://a.b.c.d/dfg-network/custom_neutron_guest_rhel_8.4.qcow2
161+
162+
# Check OpenStack connectivity
163+
. openstackrc
164+
openstack image list
165+
166+
# See if the image got partially created
167+
openstack image show 11111111-1111-1111-1111-111111111111
168+
```
169+
170+
## What I Discovered
171+
172+
In my case, I discovered that the **Timeout was too short**. A 997MB image needs more than 5 minutes to import on our storage backend.
173+
Check how much time it takes and add 20% spear time.
174+
175+
176+
```bash
177+
# Manual image upload
178+
time openstack image create \
179+
--disk-format qcow2 \
180+
--id 11111111-1111-1111-1111-111111111111 \
181+
--file custom_neutron_guest_rhel_8.4.qcow2 \
182+
--public \
183+
custom_neutron_guest_rhel_8.4
184+
```
185+
186+
## Lessons Learned
187+
188+
1. **OpenShift security is not optional** - Learn to work with it, not against it
189+
2. **Image imports can be slow** - Plan your timeouts accordingly
190+
3. **oc debug is your friend** - When it works with the security context
191+
192+
## Bonus Tip
193+
194+
If you're running into similar issues regularly, consider pre-loading your test images during cluster setup rather than during test execution. Your future self will thank you when tests aren't timing out at 2 AM.
195+
196+
Remember, debugging in OpenShift is like debugging anywhere else, except with more YAML and more security warnings. But hey, at least the error messages are usually pretty clear about what's wrong!
197+
198+
Happy debugging, and may your pods always run to completion!

0 commit comments

Comments
 (0)