Skip to content

Enable streaming usage metrics in OpenAIMixin for all OpenAI-compatible providers #3981

@skamenan7

Description

@skamenan7

🚀 Describe the new functionality needed

While implementing the Bedrock provider (#3748), I found that streaming requests don't collect token usage metrics by default. Fixed it for Bedrock by adding stream_options = {"include_usage": True} when telemetry is active. As pointed out by @mattf , this should be in the OpenAIMixin base class so all OpenAI-compatible providers get streaming metrics automatically - not just Bedrock. Right now the code's in BedrockInferenceAdapter.openai_chat_completion() :

Enable streaming usage metrics when telemetry is active

if params.stream and get_current_span() is not None:
    if params.stream_options is None:
        params.stream_options = {"include_usage": True}
    elif "include_usage" not in params.stream_options:
        params.stream_options = {**params.stream_options, "include_usage": True}

Should move this into OpenAIMixin.openai_chat_completion() in src/llama_stack/providers/utils/inference/openai_mixin.py so others provide get it

💡 Why is this needed? What if we don't build it?

Without this, streaming requests have blind spots in telemetry - we can track tokens for non-streaming but not streaming.

Makes it hard to:

  • Monitor production costs accurately (streaming is common in chat apps)
  • Debug performance issues (can't see if a streaming request is token-heavy)
  • Set up proper rate limiting based on actual usage
  • The include_usage parameter isn't obvious from OpenAI docs and easy to miss.

If we don't standardize this, every new provider implementer has to discover it themselves. Also creates inconsistency - some providers would have streaming metrics, others won't. Since we already check get_current_span() is not None to detect if telemetry's enabled, there's no performance cost when telemetry is off.

Other thoughts

Thanks @mattf for pointing this out.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions