-
Notifications
You must be signed in to change notification settings - Fork 608
add document type #2207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
add document type #2207
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed everything up to 6a1a0de in 1 minute and 30 seconds. Click for details.
- Reviewed
22lines of code in1files - Skipped
0files when reviewing. - Skipped posting
1draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. py/shared/abstractions/document.py:83
- Draft comment:
MP4 is added in lowercase, matching existing values. Ensure parsers/handlers correctly support MP4 ingestion. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
Workflow ID: wflow_xG8xm6WrKoD0hUEH
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
py/shared/abstractions/document.py
Outdated
| CSS = "css" | ||
|
|
||
| # other type | ||
| UNKNOWN = "UNKNOWN" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using lowercase 'unknown' for consistency with other enum values.
| UNKNOWN = "UNKNOWN" | |
| UNKNOWN = "unknown" |
|
Not sure I understand the motivation of these changes without adding an MP4 or "unkown" parser. Adding document types, but not the underlying parsers is surely going to induce unexpected bugs, no? |
|
Yes, you are right. Maybe I will add a video parser later. |
|
I've added the |
|
For quality checks, you should be able to run You may need to install the pre-commit hooks first, depending on your set up. Can you also provide an example of the MP4 processing working? I'd love to see it in action. Thanks! |
|
@NolanTrem I’ve added a video parser and confirmed that pre-commit runs cleanly on my local machine. |
|
I have tested it like below; curl --request POST \
--url http://localhost:7272/v3/documents \
--header 'Content-Type: multipart/form-data' \
--form 'metadata={"catalog":"mp4"}' \
--form [email protected] \
--form 'collection_ids=["0a027464-a07c-4a04-9cbe-5976756791aa"]' \
--form 'ingestion_config={"extra_fields": {"file_type":"mp4"}}' |
Objective
To handle the ingestion and retrieval of video file types such as MP4. To support temporary processing workflows for undefined types.
Additions to the
DocumentTypeenum:MP4as a new video file type.UNKNOWNas a catch-all type for unspecified or unsupported document formats.Important
Add
MP4andUNKNOWNtoDocumentTypeenum indocument.pyfor video and undefined types.MP4for video file types.UNKNOWNfor unspecified or unsupported formats.This description was created by
for 6a1a0de. You can customize this summary. It will automatically update as commits are pushed.