-
Notifications
You must be signed in to change notification settings - Fork 440
Description
📋 Prerequisites
- I have searched the existing issues to avoid creating a duplicate
- By submitting this issue, you agree to follow our Code of Conduct
- I am using the latest version of the software
- I have tried to clear cache/cookies or used incognito mode (if ui-related)
- I can consistently reproduce this issue
🚦 Impact/Severity
Blocker (this doesn't just render the agent useless, it as well impacts other processes in the same node)
🐛 Bug Description
Description
When an MCP tool call fails repeatedly due to a persistent connectivity issue (e.g. connection reset by peer), the kagent-adk static Python process enters a tight retry loop with no backoff or circuit breaker, consuming 100% CPU indefinitely.
Steps to reproduce
- Deploy a kagent-adk-based agent with an MCP server tool configured
- Make the MCP server unreachable at the network level (e.g. waypoint proxy that resets connections)
- Send the agent a message that triggers the MCP tool call
- Observe CPU usage of the kagent-adk static process
Observed behavior
The agent process spins at 100% CPU. It never surfaces an error to the caller or stops retrying. The only way to stop it is to scale the deployment to 0.
PID 1322590 95.1% CPU
/.kagent/.venv/bin/python3 /.kagent/.venv/bin/kagent-adk static --host 0.0.0.0 --port 8080
The underlying error on the MCP server side:
Post "http://get-weather.kagent:3000/mcp": read tcp ...: read: connection reset by peer
Expected behavior
When an MCP tool call fails persistently, the agent should:
- Apply exponential backoff between retries
- Enforce a maximum retry count or timeout
- Return an error response to the caller rather than looping indefinitely
Environment
kagent-adk: static mode
Transport: HTTP MCP
Platform: Kubernetes (kind)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status