Skip to content

Conversation

@nPeppon
Copy link
Contributor

@nPeppon nPeppon commented Jun 11, 2025

🚧 Prototype: Adoption of Pydantic AI Framework

This PR introduces a preliminary integration of the Pydantic AI framework as a proof of concept to explore its adoption within our system.

🔎 Overview

  • Integrated a dummy tool leveraging the Pydantic AI framework.
  • Tested framework extendibility and compatibility with our existing architecture.
  • Code is currently rough and experimental, not production-ready.

⚠️ Important Notes

  • The implementation is not clean and serves mainly as a sandbox to validate ideas.
  • No full refactoring or optimization has been applied yet.
  • This is a starting point for further proper implementation.

🎯 Goals

  • Assess how Pydantic AI can integrate in current workflow.
  • Understand integration challenges and extendibility of the framework.
  • Lay groundwork for a cleaner, robust adoption.

🚀 Next Steps

  • Refine the integration with a clean, maintainable design.
  • Expand test coverage and validation.
  • Collaborate with the team to finalize and roll out properly.

Feel free to take this prototype forward and build a solid implementation based on it!


Important

Prototype integration of Pydantic AI framework with new RAGPydAgent and tools for enhanced search capabilities.

  • Behavior:
    • Introduces RAGPydAgent in rag_pyd.py to integrate Pydantic AI framework.
    • Adds smart_filter_tool and other tools in tools/built_in to refine search capabilities.
    • Updates retrieval_router.py and retrieval_service.py to support rag_pyd mode.
  • Prompts:
    • Updates prompts in dynamic_rag_agent.yaml, dynamic_rag_agent_xml_tooling.yaml, static_rag_agent.yaml, and static_research_agent.yaml to include instructions for using new tools.
  • Misc:
    • Adds pydantic-ai==0.2.16 to pyproject.toml dependencies.
    • Updates SearchSettings and AggregateSearchResult in search.py to handle new search results.

This description was created by Ellipsis for 32caecf. You can customize this summary. It will automatically update as commits are pushed.

Andrea and others added 4 commits June 6, 2025 10:12
…ip extended thinking message in Agent class for non Anthropic models
- Introduced SmartFilterTool to refine metadata and collection filters based on user queries.
- Updated relevant files to register the new tool and include it in the available tools list.
- Enhanced documentation and prompts to reflect the use of smart_filter_tool for improved search efficiency.
- Introduced Smolagent tools and the R2RSmolRAGAgent for improved retrieval performance.
- Updated the ToolRegistry to support discovery of Smolagent tools.
- Enhanced the RetrievalRouter and RetrievalService to accommodate new modes including 'rag_smol'.
- Added new tools for file content retrieval, semantic search, and metadata filtering specific to Smolagent.
- Updated project dependencies to include smolagents package.
- Replaced Smolagent tools with the new RAGPydAgent for enhanced retrieval capabilities.
- Updated ToolRegistry to remove references to Smolagent tools and streamline tool discovery.
- Modified RetrievalRouter and RetrievalService to support the new 'rag_pyd' mode.
- Introduced new Pydantic-based tools for improved functionality and integration.
- Updated project dependencies to reflect changes in agent architecture.
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to 32caecf in 2 minutes and 19 seconds. Click for details.
  • Reviewed 969 lines of code in 22 files
  • Skipped 0 files when reviewing.
  • Skipped posting 18 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. py/core/agent/tools/built_in/get_file_content.py:42
  • Draft comment:
    Repeated code for creating _pydantic_ai_tool is duplicated. Consider extracting a utility function to set up the PydanticTool from the JSON schema.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. py/core/agent/tools/built_in/smart_filter.py:69
  • Draft comment:
    Inconsistent return type: on error, the execute method returns a dict rather than an instance of SmartFilterResult. This could lead to downstream type errors.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. py/core/agent/rag_pyd.py:106
  • Draft comment:
    The get_pyd_ai_model_name method uses a simple substring check for 'gpt'; consider implementing a more robust mapping to handle various model prefixes reliably.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment has a point - the current implementation is quite basic. However, without knowing the full set of model names that need to be supported or the requirements around model name validation, it's hard to say if this is actually a problem. The code seems intentionally restrictive to only support GPT models, as evidenced by the explicit error message for unsupported models. I might be overlooking potential edge cases where the current simple check could cause issues. The comment could be highlighting a real maintainability concern. While the suggestion is reasonable, the current implementation appears intentionally simple and restrictive. Without more context about supported models, we can't be certain this needs to be more complex. The comment should be removed as it's more of a speculative suggestion without clear evidence that the current approach is problematic.
4. py/core/agent/tools/built_in/tavily_extract.py:78
  • Draft comment:
    Consider using asyncio.to_thread instead of await get_event_loop().run_in_executor for better readability and modern async patterns.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. py/sdk/sync_methods/retrieval.py:60
  • Draft comment:
    In parse_retrieval_event, JSONDecodeError is raised immediately. Consider handling parse errors more gracefully to improve robustness.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. py/core/providers/database/prompts/static_research_agent.yaml:59
  • Draft comment:
    Typo in template: 'citatons' should be 'citations'.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
7. General:1
  • Draft comment:
    The prototype pydantic agent integration is experimental. Ensure robust error handling and consistent return types across all tool execute methods before production.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
8. py/core/main/services/retrieval_service.py:232
  • Draft comment:
    Debug logs containing model details may leak sensitive information. Review logging levels before production deployment.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
9. py/core/main/services/retrieval_service.py:1929
  • Draft comment:
    Ensure that the document context concatenated in _build_aware_system_instruction does not exceed token limits, as this might affect LLM input.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
10. py/pyproject.toml:25
  • Draft comment:
    Dependency version upper bounds are very strict and could lead to compatibility issues. Consider relaxing these bounds if feasible.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50% None
11. py/sdk/sync_methods/retrieval.py:206
  • Draft comment:
    When converting messages using Message.from_dict, ensure that the input follows the expected schema to avoid runtime errors.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
12. General:1
  • Draft comment:
    There are several TODO and FIXME comments throughout the code. Please ensure these are addressed or properly tracked before production deployment.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
13. py/core/base/agent/agent.py:231
  • Draft comment:
    Typo: There's an extraneous space before the colon in "Model in use :". Consider changing it to "Model in use:".
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This is an extremely minor formatting issue in a debug log message that developers will rarely see. It doesn't affect functionality, readability, or maintainability in any meaningful way. The rules state not to make comments that are obvious or unimportant, and this seems to fall into that category. The formatting inconsistency could theoretically make the codebase less uniform in style. Some teams might have strict style guidelines about spaces before colons. Even if there are style guidelines, this is a debug log message - not production-facing code. The benefit of fixing this is so minimal that it doesn't justify the overhead of a PR comment and the subsequent change. Delete this comment as it points out an extremely minor formatting issue in a debug message that doesn't meaningfully impact code quality.
14. py/core/base/agent/tools/built_in/smart_filter.py:23
  • Draft comment:
    Typo: The description string concatenation is missing a space between sentences. Consider adding a space after the period on the line ending "Does NOT perform the search itself." to avoid merging with the next sentence.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While the comment is technically correct about the missing space, this is an extremely minor formatting issue. The description is still readable and the meaning is clear. The rules state not to make comments that are obvious or unimportant. This feels like a very low-priority nitpick. The missing space could potentially make the description slightly harder to read in some contexts, and maintaining consistent formatting is generally good practice. However, this is such a minor issue that it doesn't warrant a PR comment. It doesn't affect functionality and barely impacts readability. This comment should be removed as it's too minor and doesn't meet the bar for "clearly a code change required" per the rules.
15. py/core/base/agent/tools/built_in/smart_filter.py:72
  • Draft comment:
    Typographical error: The error message references 'SmartRagTool' instead of 'SmartFilterTool'.
  • Reason this comment was not posted:
    Marked as duplicate.
16. py/core/base/agent/tools/built_in/smart_filter.py:104
  • Draft comment:
    Typographical suggestion: Consider revising the phrasing "We will use this new params to query the database..." to "We will use these new parameters to query the database...".
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While this is technically a grammatical error, it's in an LLM prompt string. LLMs are quite robust to minor grammatical errors and this won't affect functionality. The comment is about something that changed (new file), but it's an extremely minor issue that doesn't affect code quality or functionality in any meaningful way. The grammar error could make the code look less professional. Also, since this is a new file, fixing it now would be easier than later. While true, we should focus PR comments on more substantial issues. This is too minor and doesn't affect functionality. Delete the comment. The grammatical issue is too minor to warrant a PR comment, especially since it's in an LLM prompt that will work fine either way.
17. py/core/providers/database/prompts/static_research_agent.yaml:18
  • Draft comment:
    Typographical error: Consider revising "help you narrowing the context" to "help you narrow the context" for clarity.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While the grammar correction is technically correct, I need to consider if this is important enough to warrant a comment. The meaning is perfectly clear either way. This is a prompt file that will be read by an AI model, not user-facing documentation. The grammatical difference is minor and doesn't affect functionality or readability significantly. The grammar is technically incorrect and fixing it would make the text more professional. Some might argue that maintaining high quality even in internal files is important. While correct grammar is good, this is an extremely minor issue in an internal prompt file. The meaning is completely clear either way, and this kind of nitpicky comment creates unnecessary noise. This comment should be deleted as it's too minor and doesn't affect functionality or clarity in any meaningful way.
18. py/core/providers/database/prompts/static_research_agent.yaml:19
  • Draft comment:
    Typographical/grammatical note: "rag tools reply" might be clearer as "rag tools' reply" to properly indicate possession. Please verify if this phrasing accurately conveys the intended meaning.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative and asks the PR author to verify the phrasing, which violates the rules. It does not provide a specific code suggestion or ask for a test to be written. Therefore, it should be removed.

Workflow ID: wflow_552fNQF52VZJZ70I

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

or not hasattr(context, "rag_generation_config")
):
logger.error(
"Context missing database_provider or config or rag_generation_config for SmartRagTool"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typographical error: The error message references SmartRagTool but the class is named SmartFilterTool. Please update for consistency.

Suggested change
"Context missing database_provider or config or rag_generation_config for SmartRagTool"
"Context missing database_provider or config or rag_generation_config for SmartFilterTool"

logger.error(f"LLM did not return valid JSON: {llm_response}")
return {"collections": [], "filters": {}, "prompt_mod": query}
except Exception as e:
logger.error(f"Error in SmartRagTool LLM analysis: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typographical error: The error message 'Error in SmartRagTool LLM analysis' should reference 'SmartFilterTool' for consistency.

Suggested change
logger.error(f"Error in SmartRagTool LLM analysis: {e}")
logger.error(f"Error in SmartFilterTool LLM analysis: {e}")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant