Skip to content

Conversation

@zhengwj533
Copy link
Contributor

@zhengwj533 zhengwj533 commented Jul 30, 2025

Adapt to the case where the return value structure of parser.ingest is JSON.


Important

Update parse function in base.py to handle JSON return values from parser.ingest, extracting content and metadata.

  • Behavior:
    • Update parse function in base.py to handle JSON return values from parser.ingest.
    • Extracts content and metadata from JSON if text is a dictionary.
    • Appends extracted data to contents list.
  • Metadata Handling:
    • Adds parser_generated metadata if available in JSON return.
    • Updates metadata handling to include parser_generated in DocumentChunk creation.

This description was created by Ellipsis for 13201d0. You can customize this summary. It will automatically update as commits are pushed.

Adapt to the case where the return value structure of parser.ingest is JSON.
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to 13201d0 in 1 minute and 35 seconds. Click for details.
  • Reviewed 41 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. py/core/providers/ingestion/r2r/base.py:390
  • Draft comment:
    Added 'document=document' when calling .ingest. Ensure all parser implementations handle this extra parameter and that the change is intentional given that override calls don’t include it.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
2. py/core/providers/ingestion/r2r/base.py:394
  • Draft comment:
    Checks now verify that the parser’s return is a dict. Confirm that the JSON includes the expected 'content' and 'metadata' keys, and consider adding validation if key types aren’t guaranteed.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
3. py/core/providers/ingestion/r2r/base.py:412
  • Draft comment:
    The variable 'parser_generated' captures parser metadata and is later added to chunk metadata. Consider renaming it (e.g., 'parser_metadata') for clarity, and verify that the conditional check (which ignores empty dicts) is intended.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment has two parts: 1) A suggestion to rename the variable for clarity 2) Questioning the empty dict check. The variable name is reasonably clear as is - it represents metadata generated by the parser. The empty dict check is a standard pattern and there's no strong reason to question it. This feels like nitpicking rather than a substantive issue. The variable name could potentially be confusing since 'generated' is ambiguous. The empty dict check is a common pattern but could hide bugs if metadata is unexpectedly missing. While the naming could be marginally improved, the current name is adequate and the empty dict check is a standard defensive programming practice. Neither issue is significant enough to warrant a comment. Delete this comment as it suggests minor changes that aren't clearly beneficial and questions standard programming patterns.

Workflow ID: wflow_pQq2QBrRwnWEXr6E

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant