-
Notifications
You must be signed in to change notification settings - Fork 668
Labels
Area: InferenceActivities related to Gateway API Inference Extension support.Activities related to Gateway API Inference Extension support.Area: agentgatewayPriority: HighRequired in next 3 months to make progress, bugs that affect multiple users, or very bad UXRequired in next 3 months to make progress, bugs that affect multiple users, or very bad UX
Description
kgateway version
main
Kubernetes Version
v1.35.0
Describe the bug
InferencePool status is not being managed:
$ kubectl get inferencepool vllm-llama3-8b-instruct -o yaml
apiVersion: inference.networking.k8s.io/v1
kind: InferencePool
metadata:
annotations:
meta.helm.sh/release-name: vllm-llama3-8b-instruct
meta.helm.sh/release-namespace: default
creationTimestamp: "2026-02-02T16:29:44Z"
generation: 1
labels:
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: vllm-llama3-8b-instruct-epp
app.kubernetes.io/version: v1.3.0
name: vllm-llama3-8b-instruct
namespace: default
resourceVersion: "9938"
uid: e3469d81-5ddf-415a-83a6-ec51f95a3853
spec:
endpointPickerRef:
failureMode: FailClose
group: ""
kind: Service
name: vllm-llama3-8b-instruct-epp
port:
number: 9002
selector:
matchLabels:
app: vllm-llama3-8b-instruct
targetPorts:
- number: 8000However, kgateway does properly configure the agtw data plane:
$ IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
PORT=80
curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
"model": "food-review-1",
"prompt": "Write as if you were a critic: San Francisco",
"max_tokens": 100,
"temperature": 0
}'
HTTP/1.1 200 OK
server: fasthttp
server: fasthttp
date: Mon, 02 Feb 2026 16:33:26 GMT
date: Mon, 02 Feb 2026 16:33:26 GMT
content-type: application/json
content-type: application/json
x-inference-port: 8000
x-inference-port: 8000
x-inference-pod: vllm-llama3-8b-instruct-549764dd95-8lwhh
x-inference-pod: vllm-llama3-8b-instruct-549764dd95-8lwhh
x-went-into-resp-headers: true
transfer-encoding: chunked
{"choices":[{"finish_reason":"length","index":0,"text":"The rest is silence. Today is a nice sunny day. To be or not to be that is the question. Testing, testing 1,2,3. Today is a nice sunny day. Testing, testing 1,2,3. To be or not to be that is the question. I am your AI assistant, how can I help you today? Alas, poor Yorick! I knew him, Horatio: A fellow of infinite jest Testing, testing 1,2,3. I am your "}],"created":1770050006,"do_remote_decode":false,"do_remote_prefill":false,"id":"chatcmpl-2fb23bd3-fb86-4a59-be20-cc06b6982ad7","model":"food-review-1","object":"text_completion","remote_block_ids":null,"remote_engine_id":"","remote_host":"","remote_port":0,"usage":{"completion_tokens":100,"prompt_tokens":10,"total_tokens":110}}HTTPRoute status is as expected:
$ kubectl get httproute vllm-llama3-8b-instruct -o yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
annotations:
meta.helm.sh/release-name: vllm-llama3-8b-instruct
meta.helm.sh/release-namespace: default
creationTimestamp: "2026-02-02T16:29:44Z"
generation: 1
labels:
app.kubernetes.io/managed-by: Helm
name: vllm-llama3-8b-instruct
namespace: default
resourceVersion: "9947"
uid: 8ef8ff51-c848-4898-adee-4442b66c4b8a
spec:
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: inference-gateway
rules:
- backendRefs:
- group: inference.networking.k8s.io
kind: InferencePool
name: vllm-llama3-8b-instruct
weight: 1
matches:
- path:
type: PathPrefix
value: /
timeouts:
request: 300s
status:
parents:
- conditions:
- lastTransitionTime: "2026-02-02T16:29:44Z"
message: ""
observedGeneration: 1
reason: Accepted
status: "True"
type: Accepted
- lastTransitionTime: "2026-02-02T16:29:44Z"
message: ""
observedGeneration: 1
reason: ResolvedRefs
status: "True"
type: ResolvedRefs
controllerName: agentgateway.dev/agentgateway
parentRef:
group: gateway.networking.k8s.io
kind: Gateway
name: inference-gateway
namespace: defaultExpected Behavior
InferencePool status to be populated.
Steps to reproduce the bug
- Use make tooling to create cluster, run kgtw with agtw DP, etc.
- Follow the steps in the Inference Gateway getting started guide.
- Use the above examples to check resource status, verify connectivity, etc.
Additional Environment Detail
No response
Additional Context
InferencePool status is managed correctly in v2.1.2.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Area: InferenceActivities related to Gateway API Inference Extension support.Activities related to Gateway API Inference Extension support.Area: agentgatewayPriority: HighRequired in next 3 months to make progress, bugs that affect multiple users, or very bad UXRequired in next 3 months to make progress, bugs that affect multiple users, or very bad UX