fix(tools): remove workspace and format_markdown tools to fix hallucination loops#460
Draft
sriaradhyula wants to merge 2 commits intomainfrom
Draft
fix(tools): remove workspace and format_markdown tools to fix hallucination loops#460sriaradhyula wants to merge 2 commits intomainfrom
sriaradhyula wants to merge 2 commits intomainfrom
Conversation
…nation loops Removed problematic tools causing infinite validation loops: - format_markdown: caused repeated validation calls - workspace tools: designed for parallel execution (not applicable to sequential delegation) Changes: - Removed format_markdown, write_workspace_file, read_workspace_file, list_workspace_files, clear_workspace from deep_agent.py - Cleaned up prompt_config.deep_agent.yaml: * Removed all workspace tool examples (Examples 3, 4, 5) * Removed format_markdown references from TODOs and checklist * Updated Available Tools section * Simplified parallel execution examples - Created ADR documenting the change and rationale - Updated ADR README with new entry Impact: - Eliminates infinite loop hallucinations (previously ~390KB repeated output) - Reduces cognitive load on LLM (8 tools → 3 utility tools) - Tools now match architecture (sequential delegation, not parallel) - Keeps essential tools: reflect_on_output, fetch_url, get_current_date Tested with: curl query showed hallucination issue Root cause: Architectural mismatch - workspace tools for parallel execution in sequential delegation system Refs: hallucination_analysis.md (analysis), docs/docs/changes/2025-01-12-remove-workspace-and-markdown-tools.md (ADR) Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
- Add newlines at end of files for MCP servers - Update .dockerignore files - Fix formatting in tool tests - Update ADR documentation formatting Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
10c1eb2 to
b95291c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🎯 Summary
This PR fixes critical hallucination issues in the Platform Engineer multi-agent system by removing tools that were causing infinite validation loops and unnecessary cognitive overhead. The changes eliminate ~390KB of repeated output and significantly improve response quality.
🐛 Problem Statement
The Issue
The Platform Engineer was experiencing severe hallucination problems:
Infinite Validation Loops: The
format_markdowntool caused the agent to enter infinite loops:format_markdownfor validation{"valid": true, "message": "No formatting changes needed"}Massive Repeated Output: A simple query like "show my GitHub profile" generated ~390KB of repeated content
Architectural Mismatch: Workspace tools were designed for parallel agent execution, but Platform Engineer uses sequential delegation
Root Cause Analysis
Format Markdown Tool:
Workspace Tools:
From
ai_platform_engineering/multi_agents/tools/workspace_ops.py:The Platform Engineer uses sequential delegation (via DeepAgents framework):
NOT parallel execution:
Since there's no parallel execution, workspace coordination tools serve no purpose.
🔧 Changes Made
Commit 1: Remove Problematic Tools
File:
ai_platform_engineering/multi_agents/platform_engineer/deep_agent.pyBefore:
After:
Removed Tools:
format_markdown- Causing infinite validation loopswrite_workspace_file- Designed for parallel execution (not applicable)read_workspace_file- Designed for parallel execution (not applicable)list_workspace_files- Designed for parallel execution (not applicable)clear_workspace- Designed for parallel execution (not applicable)Tool Count: Reduced from 8 utility tools to 3 utility tools (+ specialized agent tools)
Commit 2: Fix Formatting and Linting
Fixed code quality issues across the codebase:
MCP Servers (8 files):
ai_platform_engineering/agents/argocd/mcp/mcp_argocd/__main__.pyai_platform_engineering/agents/backstage/mcp/mcp_backstage/__main__.pyai_platform_engineering/agents/confluence/mcp/mcp_confluence/__main__.pyai_platform_engineering/agents/jira/mcp/mcp_jira/__main__.pyai_platform_engineering/agents/komodor/mcp/mcp_komodor/__main__.pyai_platform_engineering/agents/pagerduty/mcp/mcp_pagerduty/__main__.pyai_platform_engineering/agents/slack/mcp/mcp_slack/__main__.pyai_platform_engineering/agents/splunk/mcp/mcp_splunk/__main__.pyTools and Tests (7 files):
ai_platform_engineering/multi_agents/tools/fetch_url.pyai_platform_engineering/multi_agents/tools/reflect_on_output.pyai_platform_engineering/multi_agents/tools/tests/*.pyDocker Configuration:
.dockerignoreai_platform_engineering/knowledge_bases/rag/.dockerignoreai_platform_engineering/knowledge_bases/rag/build/.dockerignoreai_platform_engineering/knowledge_bases/rag/build/Dockerfile.webuiOther:
.cursorrulesdocker-compose.dev.yamldocs/docs/changes/2025-01-12-remove-workspace-and-markdown-tools.mdChanges:
📈 Impact
Positive Outcomes
✅ Eliminates Infinite Loops: No more repeated validation calls that cause agent to loop indefinitely
✅ Reduces Cognitive Load: Fewer tools for LLM to consider (8 tools → 3 tools means faster tool selection and better decision making)
✅ Faster Response Times: Less tool selection overhead, more focused agent behavior
✅ Cleaner Output: Eliminates ~390KB of repeated content in responses
✅ Architectural Alignment: Tools now match the sequential delegation architecture
✅ Better Resource Usage: Reduced memory consumption and token usage
Neutral Considerations
No Negative Impact Expected
The removed tools were either:
🧪 Testing
Before This Change
Result: ~390KB of repeated output with infinite validation loops
After This Change
Test Coverage
📚 Documentation
Architecture Decision Record (ADR)
Location:
docs/docs/changes/2025-01-12-remove-workspace-and-markdown-tools.mdStatus: 🟢 In-use
The ADR documents:
Key Insights from ADR
Why Sequential Delegation Doesn't Need Workspace Tools:
Why Format Markdown Caused Loops:
🔄 Alternatives Considered
1. Fix Prompts Instead of Removing Tools
Rejected because:
2. Remove Only format_markdown
Rejected because:
3. Add Loop Detection
Rejected because:
🚀 Rollback Plan
If issues arise after this change:
1. Markdown Formatting Problems
format_markdownwith strict usage instructions in prompt2. Need for Parallel Coordination
📊 Monitoring Plan
After deployment, monitor:
Performance Metrics:
Agent Behavior:
User Experience:
🔐 Compliance
Conventional Commits ✅
All commits follow the Conventional Commits specification:
fix(tools): remove workspace and format_markdown tools to fix hallucination loopschore: fix file formatting and linting issuesDeveloper Certificate of Origin (DCO) ✅
All commits include DCO sign-off:
Code Quality ✅
📋 Checklist
docs/docs/changes/🎓 Lessons Learned
Tool Selection Matters: Giving agents too many tools increases cognitive load and can cause confusion
Architecture Alignment: Tools must match the execution model (sequential vs parallel)
Validation Tools Need Limits: Self-validation tools can cause loops if not properly bounded
Less is More: Reducing from 8 to 3 utility tools improved rather than hindered performance
LLMs Format Well Naturally: Modern LLMs produce good markdown without explicit validation tools
Signed-off-by: Sri Aradhyula sraradhy@cisco.com