Skip to content

anthropic: add improved streaming thinking/reasoning token support#1418

Merged
tmc merged 1 commit intomainfrom
feature/anthropic-streaming-thinking
Oct 15, 2025
Merged

anthropic: add improved streaming thinking/reasoning token support#1418
tmc merged 1 commit intomainfrom
feature/anthropic-streaming-thinking

Conversation

@tmc
Copy link
Owner

@tmc tmc commented Oct 15, 2025

Summary

Adds improved real-time streaming support for thinking/reasoning tokens in the Anthropic client, enabling thinking content to appear ahead of responses instead of after completion.

Changes

  • Add StreamingReasoningFunc field to messagePayload and MessageRequest
  • Modify handleThinkingDelta() to call streaming callback in real-time
  • Wire up callback from llms.CallOptions through anthropicllm.go to internal client
  • Update setMessageDefaults() to enable streaming when reasoning func present

Implementation Details

Follows the same pattern as OpenAI client (llms/openai/internal/openaiclient/chat.go:638-663). When thinking_delta events arrive from the Anthropic API, they're immediately passed to the StreamingReasoningFunc callback instead of being buffered until response completion.

Testing

  • All existing tests pass
  • Tested with cgpt integration confirming real-time grey thinking display
  • Thinking tokens now stream before response content (matching ollama UX)

@tmc tmc changed the title feat(anthropic): add streaming thinking/reasoning token support anthropic: add improved streaming thinking/reasoning token support Oct 15, 2025
Implement StreamingReasoningFunc support in the Anthropic client to enable
real-time streaming of thinking tokens during extended thinking responses.

Changes:
- Add StreamingReasoningFunc field to messagePayload, MessageRequest structs
- Modify handleThinkingDelta() to call StreamingReasoningFunc when thinking
  chunks arrive during streaming
- Wire up StreamingReasoningFunc from llms.CallOptions through to the
  Anthropic client payload
- Update setMessageDefaults to enable streaming when StreamingReasoningFunc
  is provided

This follows the same pattern as the OpenAI client (chat.go:638-663) and
enables thinking tokens to stream in real-time at the BEGINNING of the
response, rather than appearing after the response completes.

Fixes issue where thinking_delta events were not calling the streaming
reasoning callback, causing thinking content to only be available after
response completion.
@tmc tmc force-pushed the feature/anthropic-streaming-thinking branch from 0d3e668 to 5435f15 Compare October 15, 2025 20:23
@tmc tmc merged commit 8e8a540 into main Oct 15, 2025
161 checks passed
@tmc tmc deleted the feature/anthropic-streaming-thinking branch October 15, 2025 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant