Skip to content

Commit c40abb0

Browse files
committed
CI: Disable stack cost retrieval to stop job hang
This is a temporary workaround to address critical job hangs caused by lftools openstack stack cost command lacking timeout handling. Problem (IT-28614): The lftools stack cost command hangs indefinitely when: - Pricing API (pricing.vexxhost.net) is slow/unresponsive - OpenStack SDK queries for nested stacks take too long - Network operations lack timeout parameters This causes: - Jobs stuck at 'Retrieving stack cost for: <stack-name>' - Downstream jobs waiting on checkpoints indefinitely - Jenkins queue buildup (66+ jobs waiting reported) - Requires manual intervention to cancel stuck jobs Solution: Disable stack cost retrieval by commenting out the lftools call and hardcode stack cost to 0 in the stack-cost file. Known Regression: Stack costs will be reported as 0 in archived cost.csv files, losing cost tracking data for OpenStack resources. Instance-level costs are still collected via job-cost.sh pricing API calls. Root Cause: lftools/openstack/stack.py cost() function lacks: - timeout parameter for urllib.request.urlopen() (line 87) - timeout handling for OpenStack SDK resource enumeration - graceful degradation on network failures Next Steps: 1. Implement timeout handling in lftools 2. Once lftools is fixed, revert this workaround 3. Create tracking issue for the regression (stack costs = 0) This stopgap prioritizes job reliability over cost data collection. Issue-ID: IT-28614 Change-Id: I9d054347e485f4843c7287aec49fb7aff0962e96 Signed-off-by: Anil Belur <[email protected]>
1 parent 463db44 commit c40abb0

File tree

2 files changed

+43
-7
lines changed

2 files changed

+43
-7
lines changed
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
---
2+
fixes:
3+
- |
4+
**STOPGAP FIX:** Temporarily disable stack cost retrieval in
5+
openstack-stack-delete.sh to prevent jobs from hanging indefinitely.
6+
7+
**Problem:** The lftools openstack stack cost command hangs without
8+
timeout handling when querying the pricing API or OpenStack resources,
9+
causing:
10+
11+
- Jobs stuck at "Retrieving stack cost" phase
12+
- Subsequent jobs waiting on checkpoints
13+
- Jenkins queue buildup and capacity issues
14+
15+
**Solution:** Stack cost is now hardcoded to 0 until lftools implements
16+
proper timeout handling for network operations.
17+
18+
**Known Regression:** Stack costs will be reported as 0 in cost.csv
19+
files, losing cost tracking data for OpenStack resources. Instance-level
20+
costs from the pricing API are still collected via job-cost.sh.
21+
22+
**Root Cause:** lftools lacks timeout parameters for urllib.request.urlopen()
23+
and OpenStack SDK operations in the stack cost calculation.
24+
25+
**Tracking:**
26+
27+
- Fixes: IT-28614 (Jobs hanging on stack cost retrieval)
28+
- Regression: Stack costs always reported as 0 (needs lftools fix)
29+
- Upstream: lftools needs timeout implementation for openstack stack cost command
30+
31+
**Next Steps:** Once lftools implements timeouts, this workaround should
32+
be reverted to restore cost collection functionality.

shell/openstack-stack-delete.sh

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,13 +21,17 @@ lf-activate-venv --python python3 "lftools[openstack]" \
2121
python-openstackclient \
2222
urllib3~=1.26.15
2323

24-
echo "INFO: Retrieving stack cost for: $OS_STACK_NAME"
25-
if ! lftools openstack --os-cloud "$OS_CLOUD" stack cost "$OS_STACK_NAME" > stack-cost; then
26-
echo "WARNING: Unable to get stack costs, continuing anyway"
27-
echo "total: 0" > stack-cost
28-
else
29-
echo "DEBUG: Successfully retrieved stack cost: $(cat stack-cost)"
30-
fi
24+
# Disabled stack cost retrieval due to lftools hanging without timeout
25+
# See: OpenDaylight Jenkins issue with stuck jobs
26+
# echo "INFO: Retrieving stack cost for: $OS_STACK_NAME"
27+
# if ! lftools openstack --os-cloud "$OS_CLOUD" stack cost "$OS_STACK_NAME" > stack-cost; then
28+
# echo "WARNING: Unable to get stack costs, continuing anyway"
29+
# echo "total: 0" > stack-cost
30+
# else
31+
# echo "DEBUG: Successfully retrieved stack cost: $(cat stack-cost)"
32+
# fi
33+
echo "INFO: Stack cost retrieval disabled, setting cost to 0"
34+
echo "total: 0" > stack-cost
3135

3236
# Delete the stack even if the stack-cost script fails
3337
lftools openstack --os-cloud "$OS_CLOUD" stack delete "$OS_STACK_NAME" \

0 commit comments

Comments
 (0)