-
Notifications
You must be signed in to change notification settings - Fork 168
Open
Labels
Description
Summary
We are experiencing intermittent 504 Gateway Timeout errors when deploying new versions of backend applications behind NGINX Gateway Fabric.
The issue occurs after a backend rollout, where the NGINX data plane continues routing traffic to stale Pod IPs that no longer exist, even though:
The new backend Pods are running and healthy
Control plane logs show successful configuration updates
Direct access via NodePort / ClusterIP works correctly
Restarting the data plane pods immediately resolves the issue.
Environment
- NGINX Gateway Fabric version: v2.3.0
- Gateway API version: v1.4.1
- Deployment mode:
- Control Plane replicas: 1
- Data Plane replicas: 5
Gateway Configuration (simplified)
- Single Gateway
- Multiple HTTPS listeners
- Wildcard hostnames
- Routes attached from multiple namespaces
listeners:
- protocol: HTTP
port: 80
- protocol: HTTPS
port: 443
hostname: "*.xxxtest.com"
- protocol: HTTPS
port: 443
hostname: "*.xxx.com"
Symptoms
During a backend rollout:
- Requests through the Gateway intermittently return 504 Gateway Timeout
- Requests to the same Service via NodePort succeed
- Gateway access logs show traffic being forwarded to unexpected backend IPs
- These IPs correspond to Pods from the previous version, already terminated
- Restarting the NGF data plane clears the stale IPs and restores normal traffic
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
🆕 New