Skip to content

Conversation

@awasthiabhijeet
Copy link

Summary:
Adds video processing support to the Llama4 model by extending the existing vision encoder infrastructure to handle video content. It introduces video-specific special tokens (<|video|>, <|vid_start|>, <|vid_end|>, <|vid_frame_separator|>) in the tokenizer, implements a new transform_video() method that processes video clips as sequences of frames through the existing image transform pipeline, and registers a "video" encoder in the EarlyFusionModel that reuses the vision encoder while maintaining separate tokenization paths for images and videos.

(Used HF implementation as a reference to ensure consistent changes in _tokenizer.py)

Differential Revision: D89577119

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 22, 2025
@meta-codesync
Copy link

meta-codesync bot commented Dec 22, 2025

@awasthiabhijeet has exported this pull request. If you are a Meta employee, you can view the originating Diff in D89577119.

@meta-codesync
Copy link

meta-codesync bot commented Dec 22, 2025

@awasthiabhijeet has imported this pull request. If you are a Meta employee, you can view this in D89577119.

Summary:
Adds video processing support to the Llama4 model by extending the existing vision encoder infrastructure to handle video content. It introduces video-specific special tokens (<|video|>, <|vid_start|>, <|vid_end|>, <|vid_frame_separator|>) in the tokenizer, implements a new transform_video() method that processes video clips as sequences of frames through the existing image transform pipeline, and registers a "video" encoder in the EarlyFusionModel that reuses the vision encoder while maintaining separate tokenization paths for images and videos.

(Used HF implementation as a reference to ensure consistent changes in _tokenizer.py)


Reviewed By: felipemello1

Differential Revision: D89577119

Pulled By: awasthiabhijeet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants