Skip to content

Conversation

@leot13
Copy link
Contributor

@leot13 leot13 commented Jan 8, 2026

Type / Scope

  • Type: Feature
  • Scope: datasets, scripts

Summary / Motivation

LeRobot hardcodes libsvtav1 for video encoding during dataset recording. While AV1 offers excellent compression, it is CPU-heavy and can starve camera capture threads during recording. This PR exposes the existing vcodec parameter (which encode_video_frames already supports) through the LeRobotDataset API and recording CLI, allowing users to choose faster codecs like h264 or hevc when needed.

The implementation threads the codec option through both sequential and parallel encoding paths without changing any defaults—existing workflows continue to use libsvtav1 unless explicitly overridden.

Related issues

  • Related: Users needing to subclass LeRobotDataset or monkeypatch to use different codecs during recording

What changed

  • src/lerobot/datasets/lerobot_dataset.py:

    • Added VALID_VIDEO_CODECS constant (h264, hevc, libsvtav1)
    • Updated _encode_video_worker() to accept and forward vcodec parameter
    • Added vcodec parameter to LeRobotDataset.__init__() and LeRobotDataset.create() with validation
    • Updated all encoding call sites (sequential and parallel) to use self.vcodec
  • src/lerobot/scripts/lerobot_record.py:

    • Added vcodec field to DatasetRecordConfig
    • Forward vcodec to both LeRobotDataset() (resume) and LeRobotDataset.create() (new dataset)
    • Updated docstring example to show --dataset.vcodec option
  • tests/datasets/test_datasets.py:

    • Added 4 unit tests for vcodec forwarding, default value, validation, and constant contents
  • No breaking changes: Default remains libsvtav1, all existing code continues to work unchanged.

How was this tested

  • Tests added:
    • test_encode_video_worker_forwards_vcodec — verifies vcodec is forwarded to encode_video_frames
    • test_encode_video_worker_default_vcodec — verifies default is libsvtav1
    • test_lerobot_dataset_vcodec_validation — verifies invalid codecs raise ValueError
    • test_valid_video_codecs_constant — verifies constant contains expected codecs

How to run locally (reviewer)

  • Run the new tests:

    pytest -q tests/datasets/test_datasets.py -k "vcodec"
  • Test recording with different codec (requires robot hardware or mock):

    lerobot-record \
        --robot.type=so100_follower \
        --dataset.repo_id=test/vcodec_test \
        --dataset.single_task="Test task" \
        --dataset.vcodec=h264

Checklist (required before merge)

  • Linting/formatting run (pre-commit run -a)
  • All tests pass locally (pytest)
  • Documentation updated
  • CI is green

Reviewer notes

  • The tests use mocking to avoid requiring actual video encoders in CI
  • Focus areas: ensure the parallel encoding path (ProcessPoolExecutor submissions) correctly pickles and forwards the vcodec argument
  • The validation uses a simple set membership check; consistent with how encode_video_frames validates codecs

@leot13 leot13 marked this pull request as ready for review January 8, 2026 14:51
@github-actions github-actions bot added dataset Issues regarding data inputs, processing, or datasets tests Problems with test coverage, failures, or improvements to testing labels Jan 8, 2026
@imstevenpmwork imstevenpmwork self-requested a review January 8, 2026 15:00
imstevenpmwork
imstevenpmwork previously approved these changes Jan 8, 2026
Copy link
Collaborator

@imstevenpmwork imstevenpmwork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Can you run pre-commit run -a ?

@imstevenpmwork imstevenpmwork merged commit 8b6fc0a into huggingface:main Jan 8, 2026
8 checks passed
@atyshka
Copy link
Contributor

atyshka commented Jan 8, 2026

Nice work! Does this enable hardware-accelerated GPU codecs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataset Issues regarding data inputs, processing, or datasets tests Problems with test coverage, failures, or improvements to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants