perf: parallelize tool server init and reduce LLM retry overhead#139
Open
wangbinluo wants to merge 4 commits intoMiroMindAI:mainfrom
Open
perf: parallelize tool server init and reduce LLM retry overhead#139wangbinluo wants to merge 4 commits intoMiroMindAI:mainfrom
wangbinluo wants to merge 4 commits intoMiroMindAI:mainfrom
Conversation
…oMindAI#137) - P0: Parallelize MCP tool server initialization with asyncio.gather() (saves ~40-50s per task, previously ~71s sequential) - P0: Reduce LLM retry base_wait_time from 30s to 10s, max_retries from 10 to 5 - P1: Add httpx connection pooling for Serper API requests (reuse TCP connections across ~346 search calls per task) Ref: MiroMindAI#137 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
@shawnlimn @xingxuanli Could you review this PR when you get a chance? This PR addresses the performance bottlenecks identified in #137 with two optimizations:
Note: PR #138 by @JasonOA888 covers only item 1; this PR is a superset with additional optimization and benchmark data. |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Algorithm team confirmed these values are needed for reliability under high load with self-hosted sglang servers. Ref: MiroMindAI#137 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…iroMindAI#137) Previously every tool call (except playwright) spawned a new MCP server subprocess, initialized it, called the tool, then killed it. BC tasks average ~300+ tool calls, so the spawn/teardown overhead adds up. Introduce PersistentMCPSession that keeps the subprocess alive for the entire task lifetime. On connection failure it transparently reconnects once. Sessions are cleaned up via close_all_sessions() at task end. This is the P0 "MCP server connection reuse" item from MiroMindAI#137, estimated to save 2-5 min per task. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses evaluation pipeline performance bottlenecks identified in #137.
Three changes targeting the slowest parts of BC benchmark evaluation.
Closes #137
Changes
manager.pyget_all_tool_definitions()now usesasyncio.gather()instead of sequentialforloop.openai_client.pybase_wait_time30s → 10s,max_retries10 → 5. Prevents 60-90s wasted on retries.search_and_scrape_webpage.pyreuses a sharedhttpx.AsyncClientinstead of creating a new one per request (~346 search calls per BC task).Benchmark Results
Tool server initialization time (3 runs on dev_wbl)
Comparison with main branch (extracted from existing evaluation logs)
Test plan
bench_init_time.pyon dev_wbl — 17.0s avg vs main baseline 234s avg