Skip to content

Conversation

@michel-aractingi
Copy link
Collaborator

Type / Scope

  • Type: Bug
  • Scope: lerobot_train.py

Summary / Motivation

Accelerate in lerobot_train.py automatically detects and uses the best available device (CUDA or MPS > CPU), ignoring the user's policy.device setting. Trace:

1- If the user sets --policy.device=cpu
2- Preprocessor moves input batches to CPU (respecting config)
3- accelerator.prepare() moves model to CUDA (ignoring config)
4- Forward pass fails: model on CUDA, data on CPU

Reproduce

Run from main :

python -m lerobot.scripts.lerobot_train --dataset.repo_id=lerobot/pusht --policy.type=act  --policy.device=cpu  --policy.repo_id=aractingi/act_test --log_freq=1

Result: Device mismatch error between model and data.

Fix

lerobot_train.py: Force Accelerator to use cpu when the user specifies it.

force_cpu = cfg.policy.device == "cpu"
accelerator = Accelerator(..., cpu=force_cpu)

How was this tested

Run the training command with cpu and cuda or mps.

How to run locally (reviewer)

  • Run the relevant tests:

    pytest -q tests/ -k <keyword>
  • Run a quick example or CLI (if applicable):

    lerobot-train --some.option=true

Checklist (required before merge)

  • Linting/formatting run (pre-commit run -a)
  • All tests pass locally (pytest)
  • Documentation updated
  • CI is green

Reviewer notes

  • Anything the reviewer should focus on (performance, edge-cases, specific files) or general notes.
  • Anyone in the community is free to review the PR.

@michel-aractingi michel-aractingi marked this pull request as ready for review January 10, 2026 11:23
Copilot AI review requested due to automatic review settings January 10, 2026 11:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug where the Accelerator in lerobot_train.py automatically detects and uses the best available device (CUDA/MPS over CPU), ignoring the user's --policy.device=cpu configuration. This caused device mismatch errors where the model was moved to CUDA while data remained on CPU.

Changes:

  • Add logic to force Accelerator to use CPU when policy.device is explicitly set to "cpu"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Michel Aractingi <michel.aractingi@huggingface.co>
Copy link
Member

@jadechoghari jadechoghari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jadechoghari jadechoghari added bug Something isn’t working correctly training Issues related at training time labels Jan 12, 2026
@michel-aractingi michel-aractingi merged commit 91ff9c4 into main Jan 12, 2026
11 checks passed
@michel-aractingi michel-aractingi deleted the fix/cpu_training branch January 12, 2026 11:19
sandhya-cb pushed a commit to sandhya-cb/lerobot-clutterbot that referenced this pull request Jan 28, 2026
* fix cpu training in lerobot_train

* Update src/lerobot/scripts/lerobot_train.py

Signed-off-by: Michel Aractingi <michel.aractingi@huggingface.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn’t working correctly training Issues related at training time

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants