Skip to content

Conversation

@zhengwj533
Copy link
Contributor

@zhengwj533 zhengwj533 commented Jun 3, 2025

Objective

To handle the ingestion and retrieval of video file types such as MP4. To support temporary processing workflows for undefined types.

Additions to the DocumentType enum:

  • Added MP4 as a new video file type.
  • Added UNKNOWN as a catch-all type for unspecified or unsupported document formats.

Important

Add MP4 and UNKNOWN to DocumentType enum in document.py for video and undefined types.

  • DocumentType Enum:
    • Add MP4 for video file types.
    • Add UNKNOWN for unspecified or unsupported formats.

This description was created by Ellipsis for 6a1a0de. You can customize this summary. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to 6a1a0de in 1 minute and 30 seconds. Click for details.
  • Reviewed 22 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. py/shared/abstractions/document.py:83
  • Draft comment:
    MP4 is added in lowercase, matching existing values. Ensure parsers/handlers correctly support MP4 ingestion.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_xG8xm6WrKoD0hUEH

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

CSS = "css"

# other type
UNKNOWN = "UNKNOWN"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using lowercase 'unknown' for consistency with other enum values.

Suggested change
UNKNOWN = "UNKNOWN"
UNKNOWN = "unknown"

@NolanTrem
Copy link
Contributor

Not sure I understand the motivation of these changes without adding an MP4 or "unkown" parser. Adding document types, but not the underlying parsers is surely going to induce unexpected bugs, no?

@zhengwj533
Copy link
Contributor Author

Yes, you are right. Maybe I will add a video parser later.

@zhengwj533
Copy link
Contributor Author

I've added the video_parser.py file, but I have no idea how to fix these quality check errors.

@NolanTrem
Copy link
Contributor

For quality checks, you should be able to run pre-commit run --all-files

You may need to install the pre-commit hooks first, depending on your set up. Can you also provide an example of the MP4 processing working? I'd love to see it in action. Thanks!

@zhengwj533
Copy link
Contributor Author

@NolanTrem I’ve added a video parser and confirmed that pre-commit runs cleanly on my local machine.
Could you please take a look when you have a moment?

@zhengwj533
Copy link
Contributor Author

I have tested it like below; ingestion_config should be provided.

curl --request POST \
  --url http://localhost:7272/v3/documents \
  --header 'Content-Type: multipart/form-data' \
  --form 'metadata={"catalog":"mp4"}' \
  --form [email protected] \
  --form 'collection_ids=["0a027464-a07c-4a04-9cbe-5976756791aa"]' \
  --form 'ingestion_config={"extra_fields": {"file_type":"mp4"}}'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants