fix(tools): remove workspace and format_markdown tools to fix hallucination loops by sriaradhyula · Pull Request #460 · cnoe-io/ai-platform-engineering

sriaradhyula · 2025-11-12T10:27:03Z

🎯 Summary

This PR fixes critical hallucination issues in the Platform Engineer multi-agent system by removing tools that were causing infinite validation loops and unnecessary cognitive overhead. The changes eliminate ~390KB of repeated output and significantly improve response quality.

🐛 Problem Statement

The Issue

The Platform Engineer was experiencing severe hallucination problems:

Infinite Validation Loops: The format_markdown tool caused the agent to enter infinite loops:
- Agent generates markdown response
- Calls format_markdown for validation
- Tool returns: {"valid": true, "message": "No formatting changes needed"}
- Agent repeats the same content and validates again
- Loop continues indefinitely
Massive Repeated Output: A simple query like "show my GitHub profile" generated ~390KB of repeated content
Architectural Mismatch: Workspace tools were designed for parallel agent execution, but Platform Engineer uses sequential delegation

Root Cause Analysis

Format Markdown Tool:

The tool itself works correctly
The issue is agent behavior - it doesn't understand when to stop validating
Each validation call triggers another identical validation in an endless loop

Workspace Tools:
From ai_platform_engineering/multi_agents/tools/workspace_ops.py:

"This tool provides a temporary workspace for agents to coordinate outputs when multiple agents run in parallel."

The Platform Engineer uses sequential delegation (via DeepAgents framework):

User Query → Supervisor (creates TODOs) → Delegate to Subagent 1 → Wait for completion
                                        → Delegate to Subagent 2 → Wait for completion
                                        → Combine results

NOT parallel execution:

User Query → Supervisor → Subagent 1 + Subagent 2 (parallel) → Combine

Since there's no parallel execution, workspace coordination tools serve no purpose.

🔧 Changes Made

Commit 1: Remove Problematic Tools

File: ai_platform_engineering/multi_agents/platform_engineer/deep_agent.py

Before:

all_tools = all_agents + [
    reflect_on_output,
    format_markdown,        # ← REMOVED
    fetch_url,
    get_current_date,
    write_workspace_file,   # ← REMOVED
    read_workspace_file,    # ← REMOVED
    list_workspace_files,   # ← REMOVED
    clear_workspace         # ← REMOVED
]

After:

all_tools = all_agents + [
    reflect_on_output,     # ✅ Kept - useful for self-correction
    fetch_url,             # ✅ Kept - needed for external data
    get_current_date,      # ✅ Kept - useful for time-aware ops
]

Removed Tools:

❌ format_markdown - Causing infinite validation loops
❌ write_workspace_file - Designed for parallel execution (not applicable)
❌ read_workspace_file - Designed for parallel execution (not applicable)
❌ list_workspace_files - Designed for parallel execution (not applicable)
❌ clear_workspace - Designed for parallel execution (not applicable)

Tool Count: Reduced from 8 utility tools to 3 utility tools (+ specialized agent tools)

Commit 2: Fix Formatting and Linting

Fixed code quality issues across the codebase:

MCP Servers (8 files):

ai_platform_engineering/agents/argocd/mcp/mcp_argocd/__main__.py
ai_platform_engineering/agents/backstage/mcp/mcp_backstage/__main__.py
ai_platform_engineering/agents/confluence/mcp/mcp_confluence/__main__.py
ai_platform_engineering/agents/jira/mcp/mcp_jira/__main__.py
ai_platform_engineering/agents/komodor/mcp/mcp_komodor/__main__.py
ai_platform_engineering/agents/pagerduty/mcp/mcp_pagerduty/__main__.py
ai_platform_engineering/agents/slack/mcp/mcp_slack/__main__.py
ai_platform_engineering/agents/splunk/mcp/mcp_splunk/__main__.py

Tools and Tests (7 files):

ai_platform_engineering/multi_agents/tools/fetch_url.py
ai_platform_engineering/multi_agents/tools/reflect_on_output.py
ai_platform_engineering/multi_agents/tools/tests/*.py

Docker Configuration:

.dockerignore
ai_platform_engineering/knowledge_bases/rag/.dockerignore
ai_platform_engineering/knowledge_bases/rag/build/.dockerignore
ai_platform_engineering/knowledge_bases/rag/build/Dockerfile.webui

Other:

.cursorrules
docker-compose.dev.yaml
docs/docs/changes/2025-01-12-remove-workspace-and-markdown-tools.md

Changes:

Added newlines at end of files (linting requirement)
Fixed formatting inconsistencies
Updated .dockerignore patterns

📈 Impact

Positive Outcomes

✅ Eliminates Infinite Loops: No more repeated validation calls that cause agent to loop indefinitely

✅ Reduces Cognitive Load: Fewer tools for LLM to consider (8 tools → 3 tools means faster tool selection and better decision making)

✅ Faster Response Times: Less tool selection overhead, more focused agent behavior

✅ Cleaner Output: Eliminates ~390KB of repeated content in responses

✅ Architectural Alignment: Tools now match the sequential delegation architecture

✅ Better Resource Usage: Reduced memory consumption and token usage

Neutral Considerations

⚠️ Markdown Formatting: Agents must format markdown naturally (which modern LLMs typically do well without validation tools)

⚠️ Workspace Coordination: Not needed in sequential delegation model, but available if architecture changes

No Negative Impact Expected

The removed tools were either:

Causing problems (format_markdown)
Not being used effectively (workspace tools in sequential execution)

🧪 Testing

Before This Change

curl -X POST http://localhost:8000 -H "Content-Type: application/json" \
  -d '{"message":"show my GitHub profile"}'

Result: ~390KB of repeated output with infinite validation loops

After This Change

✅ Same query produces clean, concise output
✅ No infinite loops detected
✅ Markdown remains well-formatted
✅ Response time improved
✅ No tool hallucination issues

Test Coverage

Verified with simple queries ("show my GitHub profile")
Tested complex multi-step queries
Confirmed tool selection is more focused
Validated markdown output quality

📚 Documentation

Architecture Decision Record (ADR)

Location: docs/docs/changes/2025-01-12-remove-workspace-and-markdown-tools.md

Status: 🟢 In-use

The ADR documents:

Detailed context and problem analysis
Architecture comparison (sequential vs parallel)
Decision rationale and alternatives considered
Implementation details
Impact analysis
Testing strategy
Rollback plan
Future considerations

Key Insights from ADR

Why Sequential Delegation Doesn't Need Workspace Tools:

Supervisor delegates to one agent at a time
Each agent completes before next one starts
No parallel output that needs coordination
Results flow naturally through task completion

Why Format Markdown Caused Loops:

Agent generates output → validates → receives "no changes needed" → repeats
No clear termination condition in agent's understanding
Tool validates correctly but agent doesn't know when to stop
Adds complexity without benefit (LLMs format markdown well naturally)

🔄 Alternatives Considered

1. Fix Prompts Instead of Removing Tools

Rejected because:

Root cause is architectural mismatch (workspace tools for sequential execution)
Difficult to prompt away infinite loops reliably
Adds cognitive load without benefit
Doesn't address fundamental issue

2. Remove Only format_markdown

Rejected because:

Workspace tools still add unnecessary cognitive load
No parallel execution means no benefit from workspace tools
Better to remove all mismatched tools at once

3. Add Loop Detection

Rejected because:

Adds complexity without addressing root cause
Band-aid solution to architectural problem
Better to remove tools that don't fit architecture
Could mask other issues

🚀 Rollback Plan

If issues arise after this change:

1. Markdown Formatting Problems

Re-enable format_markdown with strict usage instructions in prompt
Add loop detection (max 1 call per response)
Consider post-processing markdown outside agent loop

2. Need for Parallel Coordination

Re-enable workspace tools
Update architecture to support parallel execution
Add proper documentation for when to use workspace tools
Add examples of parallel coordination scenarios

📊 Monitoring Plan

After deployment, monitor:

Performance Metrics:

Response quality (markdown formatting remains good)
Response length (no infinite loops, reasonable output size)
Response time (faster due to reduced tool overhead)
Token usage (reduced due to fewer repeated calls)

Agent Behavior:

Tool call patterns (no repeated validations)
Tool selection accuracy (better focus)
Error rates (should decrease)

User Experience:

User feedback on output quality
Task completion success rate
Query handling effectiveness

🔐 Compliance

Conventional Commits ✅

All commits follow the Conventional Commits specification:

Commit 1: fix(tools): remove workspace and format_markdown tools to fix hallucination loops
Commit 2: chore: fix file formatting and linting issues

Developer Certificate of Origin (DCO) ✅

All commits include DCO sign-off:

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

Code Quality ✅

Follows Black formatting (line length 100)
Passes Ruff linting
Includes proper docstrings
Type hints maintained

📋 Checklist

🎓 Lessons Learned

Tool Selection Matters: Giving agents too many tools increases cognitive load and can cause confusion
Architecture Alignment: Tools must match the execution model (sequential vs parallel)
Validation Tools Need Limits: Self-validation tools can cause loops if not properly bounded
Less is More: Reducing from 8 to 3 utility tools improved rather than hindered performance
LLMs Format Well Naturally: Modern LLMs produce good markdown without explicit validation tools

Signed-off-by: Sri Aradhyula sraradhy@cisco.com

…nation loops Removed problematic tools causing infinite validation loops: - format_markdown: caused repeated validation calls - workspace tools: designed for parallel execution (not applicable to sequential delegation) Changes: - Removed format_markdown, write_workspace_file, read_workspace_file, list_workspace_files, clear_workspace from deep_agent.py - Cleaned up prompt_config.deep_agent.yaml: * Removed all workspace tool examples (Examples 3, 4, 5) * Removed format_markdown references from TODOs and checklist * Updated Available Tools section * Simplified parallel execution examples - Created ADR documenting the change and rationale - Updated ADR README with new entry Impact: - Eliminates infinite loop hallucinations (previously ~390KB repeated output) - Reduces cognitive load on LLM (8 tools → 3 utility tools) - Tools now match architecture (sequential delegation, not parallel) - Keeps essential tools: reflect_on_output, fetch_url, get_current_date Tested with: curl query showed hallucination issue Root cause: Architectural mismatch - workspace tools for parallel execution in sequential delegation system Refs: hallucination_analysis.md (analysis), docs/docs/changes/2025-01-12-remove-workspace-and-markdown-tools.md (ADR) Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

- Add newlines at end of files for MCP servers - Update .dockerignore files - Fix formatting in tool tests - Update ADR documentation formatting Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

sriaradhyula added 2 commits November 12, 2025 11:15

chore: fix file formatting and linting issues

b95291c

- Add newlines at end of files for MCP servers - Update .dockerignore files - Fix formatting in tool tests - Update ADR documentation formatting Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

github-project-automation bot added this to CAIPE (AI Platform Engineering) Project Backlog Nov 12, 2025

sriaradhyula marked this pull request as draft November 28, 2025 17:11

subbaksh force-pushed the fix/remove-workspace-tools-and-formatting branch from 10c1eb2 to b95291c Compare February 5, 2026 18:59

subbaksh force-pushed the main branch from 998f1d9 to 018f980 Compare February 5, 2026 19:25

suwhang-cisco force-pushed the main branch from 89566b1 to c4318af Compare March 11, 2026 12:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tools): remove workspace and format_markdown tools to fix hallucination loops#460

fix(tools): remove workspace and format_markdown tools to fix hallucination loops#460
sriaradhyula wants to merge 2 commits intomainfrom
fix/remove-workspace-tools-and-formatting

sriaradhyula commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sriaradhyula commented Nov 12, 2025

🎯 Summary

🐛 Problem Statement

The Issue

Root Cause Analysis

🔧 Changes Made

Commit 1: Remove Problematic Tools

Commit 2: Fix Formatting and Linting

📈 Impact

Positive Outcomes

Neutral Considerations

No Negative Impact Expected

🧪 Testing

Before This Change

After This Change

Test Coverage

📚 Documentation

Architecture Decision Record (ADR)

Key Insights from ADR

🔄 Alternatives Considered

1. Fix Prompts Instead of Removing Tools

2. Remove Only format_markdown

3. Add Loop Detection

🚀 Rollback Plan

1. Markdown Formatting Problems

2. Need for Parallel Coordination

📊 Monitoring Plan

🔐 Compliance

Conventional Commits ✅

Developer Certificate of Origin (DCO) ✅

Code Quality ✅

📋 Checklist

🎓 Lessons Learned

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant