fix: return MCP connection errors to LLM instead of raising by jsonmp-k8 · Pull Request #1531 · kagent-dev/kagent

jsonmp-k8 · 2026-03-20T20:23:51Z

Summary

Wrap McpTool instances with ConnectionSafeMcpTool that catches persistent connection errors and returns them as error text to the LLM
Catches ConnectionError (stdlib), TimeoutError (stdlib), httpx.TransportError (httpx network/timeout/protocol errors), and McpError (MCP session stream drops and read timeouts)
The error message includes the tool name, error type, and instructs the LLM not to retry
KAgentMcpToolset.get_tools() automatically wraps all McpTool instances

Root cause

When an MCP HTTP tool call fails with "connection reset by peer", the error propagates up to the ADK flow handler, which sends it back to the LLM as a function error. The LLM interprets this as a transient failure and retries the same tool call — creating a tight loop of LLM call → tool call → connection error → LLM call for up to max_llm_calls (500) iterations, burning 100% CPU.

The MCP client wraps transport-level errors into McpError via mcp.shared.session.send_request() before they reach the tool, so catching only stdlib/httpx errors is insufficient — McpError must also be handled.

Testing

python -m pytest python/packages/kagent-adk/tests/unittests/test_mcp_connection_error_handling.py -v (10 tests)
python -m pytest python/packages/kagent-adk/tests/unittests/ -v (170 passed)

Test coverage:

ConnectionResetError, ConnectionRefusedError, TimeoutError — caught, returned as error dict
httpx.ConnectError, httpx.ReadError, httpx.ConnectTimeout — caught via httpx.TransportError
McpError (session read timeout) — caught, returned as error dict
ValueError, CancelledError — still raised (not connection errors)
KAgentMcpToolset.get_tools() wraps McpTool → ConnectionSafeMcpTool

Fixes #1530

Copilot

Pull request overview

Wraps MCP tools to gracefully surface persistent connection failures to the LLM as normal tool output (instead of raising), preventing tight retry loops and high CPU usage in static agent runs (Fixes #1530).

Changes:

Add ConnectionSafeMcpTool that catches connection-related exceptions and returns an error payload instructing the LLM not to retry.
Update KAgentMcpToolset.get_tools() to wrap returned McpTool instances with ConnectionSafeMcpTool.
Add unit tests covering connection vs non-connection error behavior (including CancelledError propagation).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`python/packages/kagent-adk/src/kagent/adk/_mcp_toolset.py`	Introduces `ConnectionSafeMcpTool` and wraps MCP tools returned by `KAgentMcpToolset` to avoid raising persistent connection failures.
`python/packages/kagent-adk/tests/unittests/test_mcp_connection_error_handling.py`	Adds pytest coverage ensuring connection errors are returned as error text while other exceptions still propagate.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

python/packages/kagent-adk/src/kagent/adk/_mcp_toolset.py

python/packages/kagent-adk/tests/unittests/test_mcp_connection_error_handling.py

python/packages/kagent-adk/src/kagent/adk/_mcp_toolset.py

When an MCP HTTP tool call fails with a persistent connection error (e.g. "connection reset by peer"), the error propagates to the LLM as a function error. The LLM interprets this as transient and retries the same tool call, creating a tight loop that burns 100% CPU for up to max_llm_calls (500) iterations. Wrap McpTool instances with ConnectionSafeMcpTool that catches connection errors (ConnectionError, TimeoutError, httpx.TransportError, McpError) and returns them as error text. This lets the LLM inform the user about the failure instead of retrying indefinitely. Fixes kagent-dev#1530 Signed-off-by: Jaison Paul <paul.jaison@gmail.com>

Copilot AI review requested due to automatic review settings March 20, 2026 20:23

jsonmp-k8 requested review from EItanya, peterj and yuval-k as code owners March 20, 2026 20:23

Copilot started reviewing on behalf of jsonmp-k8 March 20, 2026 20:24 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

python/packages/kagent-adk/src/kagent/adk/_mcp_toolset.py Show resolved Hide resolved

python/packages/kagent-adk/tests/unittests/test_mcp_connection_error_handling.py Show resolved Hide resolved

python/packages/kagent-adk/src/kagent/adk/_mcp_toolset.py Outdated Show resolved Hide resolved

jsonmp-k8 force-pushed the fix/1530-mcp-tool-call-cpu-spin branch 2 times, most recently from c963afe to 41c3c17 Compare March 20, 2026 21:06

jsonmp-k8 force-pushed the fix/1530-mcp-tool-call-cpu-spin branch from 41c3c17 to aa46632 Compare March 20, 2026 21:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: return MCP connection errors to LLM instead of raising#1531

fix: return MCP connection errors to LLM instead of raising#1531
jsonmp-k8 wants to merge 1 commit intokagent-dev:mainfrom
jsonmp-k8:fix/1530-mcp-tool-call-cpu-spin

jsonmp-k8 commented Mar 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jsonmp-k8 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jsonmp-k8 commented Mar 20, 2026 •

edited

Loading