Skip to content

feature: streaming tool call support for passthrough configs #2056

Description

@christinaexyou

Did you check the docs?

  • I have read all the NeMo-Guardrails docs

Is your feature request related to a problem? Please describe.

When a request with tools reaches the /v1/chat/completions endpoint with passthrough: true and stream: true, the current behavior is to return a 422 error. This blocks clients that use streaming even when they only need tool calls surfaced at the end of the stream.

Root cause (nemoguardrails/server/api.py, nemoguardrails/rails/llm/llmrails.py, nemoguardrails/server/schemas/utils.py):

  1. StreamingHandler is text-only. Tool call data populated in tool_calls_var during generation is never pushed to the streaming iterator.
  2. format_streaming_chunk has no concept of delta.tool_calls. It only knows delta.content.
    There is no mechanism to emit a terminal chunk with finish_reason: "tool_calls" before [DONE].

Describe the solution you'd like

After all text tokens are yielded, wrap tool calls as synthetic delta.tool_calls chunks in OpenAI streaming format, followed by a finish_reason: "tool_calls" chunk. Arguments are not streamed token-by-token but arrive as a single chunk at the end.

Describe alternatives you've considered

Forward raw SSE chunks from the LLM through the guardrails layer for passthrough mode, bypassing StreamingHandler. This would enable token-by-token argument streaming but introduce breaking changes.

Additional context

Relates to issue #1615 and builds on top of #1942.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requeststatus: needs triageNew issues that have not yet been reviewed or categorized.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions