Skip to content

fix(tools): remove workspace and format_markdown tools to fix hallucination loops#460

Draft
sriaradhyula wants to merge 2 commits intomainfrom
fix/remove-workspace-tools-and-formatting
Draft

fix(tools): remove workspace and format_markdown tools to fix hallucination loops#460
sriaradhyula wants to merge 2 commits intomainfrom
fix/remove-workspace-tools-and-formatting

Conversation

@sriaradhyula
Copy link
Member

🎯 Summary

This PR fixes critical hallucination issues in the Platform Engineer multi-agent system by removing tools that were causing infinite validation loops and unnecessary cognitive overhead. The changes eliminate ~390KB of repeated output and significantly improve response quality.


🐛 Problem Statement

The Issue

The Platform Engineer was experiencing severe hallucination problems:

  1. Infinite Validation Loops: The format_markdown tool caused the agent to enter infinite loops:

    • Agent generates markdown response
    • Calls format_markdown for validation
    • Tool returns: {"valid": true, "message": "No formatting changes needed"}
    • Agent repeats the same content and validates again
    • Loop continues indefinitely
  2. Massive Repeated Output: A simple query like "show my GitHub profile" generated ~390KB of repeated content

  3. Architectural Mismatch: Workspace tools were designed for parallel agent execution, but Platform Engineer uses sequential delegation

Root Cause Analysis

Format Markdown Tool:

  • The tool itself works correctly
  • The issue is agent behavior - it doesn't understand when to stop validating
  • Each validation call triggers another identical validation in an endless loop

Workspace Tools:
From ai_platform_engineering/multi_agents/tools/workspace_ops.py:

"This tool provides a temporary workspace for agents to coordinate outputs when multiple agents run in parallel."

The Platform Engineer uses sequential delegation (via DeepAgents framework):

User Query → Supervisor (creates TODOs) → Delegate to Subagent 1 → Wait for completion
                                        → Delegate to Subagent 2 → Wait for completion
                                        → Combine results

NOT parallel execution:

User Query → Supervisor → Subagent 1 + Subagent 2 (parallel) → Combine

Since there's no parallel execution, workspace coordination tools serve no purpose.


🔧 Changes Made

Commit 1: Remove Problematic Tools

File: ai_platform_engineering/multi_agents/platform_engineer/deep_agent.py

Before:

all_tools = all_agents + [
    reflect_on_output,
    format_markdown,        # ← REMOVED
    fetch_url,
    get_current_date,
    write_workspace_file,   # ← REMOVED
    read_workspace_file,    # ← REMOVED
    list_workspace_files,   # ← REMOVED
    clear_workspace         # ← REMOVED
]

After:

all_tools = all_agents + [
    reflect_on_output,     # ✅ Kept - useful for self-correction
    fetch_url,             # ✅ Kept - needed for external data
    get_current_date,      # ✅ Kept - useful for time-aware ops
]

Removed Tools:

  1. format_markdown - Causing infinite validation loops
  2. write_workspace_file - Designed for parallel execution (not applicable)
  3. read_workspace_file - Designed for parallel execution (not applicable)
  4. list_workspace_files - Designed for parallel execution (not applicable)
  5. clear_workspace - Designed for parallel execution (not applicable)

Tool Count: Reduced from 8 utility tools to 3 utility tools (+ specialized agent tools)

Commit 2: Fix Formatting and Linting

Fixed code quality issues across the codebase:

MCP Servers (8 files):

  • ai_platform_engineering/agents/argocd/mcp/mcp_argocd/__main__.py
  • ai_platform_engineering/agents/backstage/mcp/mcp_backstage/__main__.py
  • ai_platform_engineering/agents/confluence/mcp/mcp_confluence/__main__.py
  • ai_platform_engineering/agents/jira/mcp/mcp_jira/__main__.py
  • ai_platform_engineering/agents/komodor/mcp/mcp_komodor/__main__.py
  • ai_platform_engineering/agents/pagerduty/mcp/mcp_pagerduty/__main__.py
  • ai_platform_engineering/agents/slack/mcp/mcp_slack/__main__.py
  • ai_platform_engineering/agents/splunk/mcp/mcp_splunk/__main__.py

Tools and Tests (7 files):

  • ai_platform_engineering/multi_agents/tools/fetch_url.py
  • ai_platform_engineering/multi_agents/tools/reflect_on_output.py
  • ai_platform_engineering/multi_agents/tools/tests/*.py

Docker Configuration:

  • .dockerignore
  • ai_platform_engineering/knowledge_bases/rag/.dockerignore
  • ai_platform_engineering/knowledge_bases/rag/build/.dockerignore
  • ai_platform_engineering/knowledge_bases/rag/build/Dockerfile.webui

Other:

  • .cursorrules
  • docker-compose.dev.yaml
  • docs/docs/changes/2025-01-12-remove-workspace-and-markdown-tools.md

Changes:

  • Added newlines at end of files (linting requirement)
  • Fixed formatting inconsistencies
  • Updated .dockerignore patterns

📈 Impact

Positive Outcomes

Eliminates Infinite Loops: No more repeated validation calls that cause agent to loop indefinitely

Reduces Cognitive Load: Fewer tools for LLM to consider (8 tools → 3 tools means faster tool selection and better decision making)

Faster Response Times: Less tool selection overhead, more focused agent behavior

Cleaner Output: Eliminates ~390KB of repeated content in responses

Architectural Alignment: Tools now match the sequential delegation architecture

Better Resource Usage: Reduced memory consumption and token usage

Neutral Considerations

⚠️ Markdown Formatting: Agents must format markdown naturally (which modern LLMs typically do well without validation tools)

⚠️ Workspace Coordination: Not needed in sequential delegation model, but available if architecture changes

No Negative Impact Expected

The removed tools were either:

  • Causing problems (format_markdown)
  • Not being used effectively (workspace tools in sequential execution)

🧪 Testing

Before This Change

curl -X POST http://localhost:8000 -H "Content-Type: application/json" \
  -d '{"message":"show my GitHub profile"}'

Result: ~390KB of repeated output with infinite validation loops

After This Change

  • ✅ Same query produces clean, concise output
  • ✅ No infinite loops detected
  • ✅ Markdown remains well-formatted
  • ✅ Response time improved
  • ✅ No tool hallucination issues

Test Coverage

  • Verified with simple queries ("show my GitHub profile")
  • Tested complex multi-step queries
  • Confirmed tool selection is more focused
  • Validated markdown output quality

📚 Documentation

Architecture Decision Record (ADR)

Location: docs/docs/changes/2025-01-12-remove-workspace-and-markdown-tools.md

Status: 🟢 In-use

The ADR documents:

  • Detailed context and problem analysis
  • Architecture comparison (sequential vs parallel)
  • Decision rationale and alternatives considered
  • Implementation details
  • Impact analysis
  • Testing strategy
  • Rollback plan
  • Future considerations

Key Insights from ADR

Why Sequential Delegation Doesn't Need Workspace Tools:

  • Supervisor delegates to one agent at a time
  • Each agent completes before next one starts
  • No parallel output that needs coordination
  • Results flow naturally through task completion

Why Format Markdown Caused Loops:

  • Agent generates output → validates → receives "no changes needed" → repeats
  • No clear termination condition in agent's understanding
  • Tool validates correctly but agent doesn't know when to stop
  • Adds complexity without benefit (LLMs format markdown well naturally)

🔄 Alternatives Considered

1. Fix Prompts Instead of Removing Tools

Rejected because:

  • Root cause is architectural mismatch (workspace tools for sequential execution)
  • Difficult to prompt away infinite loops reliably
  • Adds cognitive load without benefit
  • Doesn't address fundamental issue

2. Remove Only format_markdown

Rejected because:

  • Workspace tools still add unnecessary cognitive load
  • No parallel execution means no benefit from workspace tools
  • Better to remove all mismatched tools at once

3. Add Loop Detection

Rejected because:

  • Adds complexity without addressing root cause
  • Band-aid solution to architectural problem
  • Better to remove tools that don't fit architecture
  • Could mask other issues

🚀 Rollback Plan

If issues arise after this change:

1. Markdown Formatting Problems

  • Re-enable format_markdown with strict usage instructions in prompt
  • Add loop detection (max 1 call per response)
  • Consider post-processing markdown outside agent loop

2. Need for Parallel Coordination

  • Re-enable workspace tools
  • Update architecture to support parallel execution
  • Add proper documentation for when to use workspace tools
  • Add examples of parallel coordination scenarios

📊 Monitoring Plan

After deployment, monitor:

Performance Metrics:

  • Response quality (markdown formatting remains good)
  • Response length (no infinite loops, reasonable output size)
  • Response time (faster due to reduced tool overhead)
  • Token usage (reduced due to fewer repeated calls)

Agent Behavior:

  • Tool call patterns (no repeated validations)
  • Tool selection accuracy (better focus)
  • Error rates (should decrease)

User Experience:

  • User feedback on output quality
  • Task completion success rate
  • Query handling effectiveness

🔐 Compliance

Conventional Commits ✅

All commits follow the Conventional Commits specification:

  • Commit 1: fix(tools): remove workspace and format_markdown tools to fix hallucination loops
  • Commit 2: chore: fix file formatting and linting issues

Developer Certificate of Origin (DCO) ✅

All commits include DCO sign-off:

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

Code Quality ✅

  • Follows Black formatting (line length 100)
  • Passes Ruff linting
  • Includes proper docstrings
  • Type hints maintained

📋 Checklist

  • Conventional commit format used
  • DCO sign-off present
  • Tests verified (integration tests passing)
  • Code formatted with Black
  • No Ruff linting errors
  • ADR created in docs/docs/changes/
  • Performance impact tested
  • Memory usage verified
  • Hallucination issues resolved
  • Documentation updated

🎓 Lessons Learned

  1. Tool Selection Matters: Giving agents too many tools increases cognitive load and can cause confusion

  2. Architecture Alignment: Tools must match the execution model (sequential vs parallel)

  3. Validation Tools Need Limits: Self-validation tools can cause loops if not properly bounded

  4. Less is More: Reducing from 8 to 3 utility tools improved rather than hindered performance

  5. LLMs Format Well Naturally: Modern LLMs produce good markdown without explicit validation tools


Signed-off-by: Sri Aradhyula sraradhy@cisco.com

…nation loops

Removed problematic tools causing infinite validation loops:
- format_markdown: caused repeated validation calls
- workspace tools: designed for parallel execution (not applicable to sequential delegation)

Changes:
- Removed format_markdown, write_workspace_file, read_workspace_file, list_workspace_files, clear_workspace from deep_agent.py
- Cleaned up prompt_config.deep_agent.yaml:
  * Removed all workspace tool examples (Examples 3, 4, 5)
  * Removed format_markdown references from TODOs and checklist
  * Updated Available Tools section
  * Simplified parallel execution examples
- Created ADR documenting the change and rationale
- Updated ADR README with new entry

Impact:
- Eliminates infinite loop hallucinations (previously ~390KB repeated output)
- Reduces cognitive load on LLM (8 tools → 3 utility tools)
- Tools now match architecture (sequential delegation, not parallel)
- Keeps essential tools: reflect_on_output, fetch_url, get_current_date

Tested with: curl query showed hallucination issue
Root cause: Architectural mismatch - workspace tools for parallel execution in sequential delegation system

Refs: hallucination_analysis.md (analysis), docs/docs/changes/2025-01-12-remove-workspace-and-markdown-tools.md (ADR)
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
- Add newlines at end of files for MCP servers
- Update .dockerignore files
- Fix formatting in tool tests
- Update ADR documentation formatting

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

1 participant