Skip to content

[Bug]: MCP code mode nested tool logs expire as false TTL errors despite successful execution #3790

@dborysenko

Description

@dborysenko

Prerequisites

  • I have searched existing issues and discussions to avoid duplicates
  • I am using the latest version (or have tested against main/nightly)

Description

When using MCP code mode, tool execution works correctly from an external MCP client, but the Bifrost UI (MCP Logs) eventually shows a false error:

{
  "is_bifrost_error": true,
  "error": {
    "message": "MCP tool execution did not complete before pending log TTL"
  },
  "extra_fields": {}
}

The error appears with a long delay (on the order of ~5 minutes), not immediately when the tool finishes. This matches the logging plugin’s in-memory pending-log cleanup behavior, not a live execution failure.

Steps to reproduce

  1. Configure at least one MCP client with Code Mode enabled (is_code_mode_client: true).
  2. Connect the client and ensure code mode meta-tools are available (listToolFiles, readToolFile, executeToolCode, etc.).
  3. Run a code mode workflow from an external MCP client (e.g. call executeToolCode so that nested MCP tools run inside the Starlark sandbox).
  4. Confirm the client receives a successful tool result.
  5. Open Bifrost UI → MCP Logs (or poll /mcp-logs) for the same request/tool activity.
  6. Wait several minutes (pending log TTL is 5 minutes; cleanup runs every 1 minute).

Expected behavior

  • MCP tool executions that complete successfully in code mode should be logged as success (or not logged at all for codemode meta-tools, per existing skip logic).
  • The UI should not show a terminal error for successful executions.

Actual behavior

  • External client execution succeeds.
  • MCP Logs in the UI eventually show status: error with message "MCP tool execution did not complete before pending log TTL".
  • Error appears after ~5 minutes, consistent with pendingLogTTL in the logging plugin.

Affected area(s)

Core (Go)

Version

1.5.5

Environment


Root cause analysis

1. Primary: nested tool calls from Starlark skip PostMCP log finalization

Code mode nested MCP tool calls are executed in core/mcp/codemode/starlark/executecode.go (callMCPTool). That path:

  1. Runs RunMCPPreHooks → logging PreMCPHook creates an in-memory pending row (pendingMCPLogsToInject, status: processing).
  2. Calls the MCP server directly via client.Conn.CallTool.
  3. Runs RunMCPPostHooks on a hand-built BifrostMCPResponse without stamping ExtraFields.MCPRequestType.

Gateway-routed tool execution uses MCPManager.runWithPluginPipeline (core/mcp/pluginpipeline.go), which calls resp.PopulateExtraFields(mcpReqType, clientName, toolName) before PostMCP hooks. The Starlark nested path does not.

plugins/logging/main.go PostMCPHook only finalizes logs when the request type is an execute-tool variant:

mcpReqType := schemas.MCPRequestType("")
if resp != nil {
    mcpReqType = resp.ExtraFields.MCPRequestType
} else if bifrostErr != nil {
    mcpReqType = bifrostErr.ExtraFields.MCPRequestType
}
if !mcpReqType.IsExecuteTool() {
    return resp, bifrostErr, nil  // early return — pending row never completed
}

Result: PreMCP creates pending state; PostMCP exits early; row stays in memory until stale cleanup.

Stale cleanup is implemented in plugins/logging/writer.go:

  • pendingLogTTL = 5 * time.Minute
  • cleanupStalePendingLogs() converts stale pending MCP rows to terminal errors via buildStaleMCPToolLogEntry() with the message above.

2. Secondary: codemode meta-tool name mismatch between PreMCP and PostMCP

In plugins/logging/main.go:

Hook Codemode check Effect
PreMCPHook IsCodemodeTool(fullToolName) only Prefixed names like myclient-executeToolCode are not skipped → pending log created
PostMCPHook IsCodemodeTool(resp.ExtraFields.ToolName) Bare names like executeToolCode are skipped → pending log not finalized

IsCodeModeTool (core/mcp/codemode.go) only matches bare tool names (listToolFiles, readToolFile, getToolDocs, executeToolCode), not {client}-{tool} prefixed names.

This can leave orphaned pending rows for codemode meta-tools when invoked with a client prefix.


Why the client works but the UI fails

  • The MCP client sees the real tool response from code mode / nested execution.
  • The UI reads persisted MCP logs from /mcp-logs, which reflect logging plugin state. Stale pending rows are flushed as TTL errors, not execution failures.

Affected code paths

File Relevance
core/mcp/codemode/starlark/executecode.go Nested callMCPTool — missing PopulateExtraFields / MCPRequestType before PostMCP hooks
core/mcp/pluginpipeline.go Reference implementation — stamps MCPRequestType correctly
plugins/logging/main.go PreMCPHook / PostMCPHook codemode skip logic
plugins/logging/writer.go pendingLogTTL, cleanupStalePendingLogs, buildStaleMCPToolLogEntry
ui/app/workspace/mcp-logs/ Displays persisted MCP log errors

Suggested fix

  1. Starlark nested tools: Before RunMCPPostHooks in callMCPTool, call PopulateExtraFields on success responses and set bifrostErr.ExtraFields.MCPRequestType on errors (same as runWithPluginPipeline). Apply the same stamping on plugin short-circuit paths in that function.
  2. Logging plugin: In PreMCPHook, skip logging when the suffix after {client}- is a codemode tool name (not only when fullToolName matches exactly). In PostMCPHook, when skipping codemode tools, Delete the matching entry from pendingMCPLogsToInject to avoid TTL false positives for legacy prefixed names.

Test ideas

  • Logging plugin: PreMCP with myclient-executeToolCode should not create pendingMCPLogsToInject entries.
  • Logging plugin: PostMCP with codemode ToolName should discard any matching pending row.
  • Integration: nested tool call from executeToolCode should produce a success (or real error) MCP log row, not a TTL error after 5 minutes.
  • Regression: existing TestMCPHooksDeferDBWriteUntilPostHookBatch and TestCleanupStalePendingMCPLogsPersistsErrorFallback in plugins/logging/operations_test.go should remain green.

Regression?

No response

Severity

Medium (some functionality impaired)

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions