-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add SmolVLA example training script #2647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a new example training script for SmolVLA that demonstrates both fine-tuning from a pretrained checkpoint and training from scratch. The script is intended to help users understand how to train SmolVLA on their own datasets, similar to existing training examples for ACT and Diffusion policies.
Key Changes
- Adds comprehensive training example with detailed comments explaining configuration options
- Supports two training modes: fine-tuning from pretrained checkpoint (default) or training from scratch
- Includes optimizer and learning rate scheduler setup using SmolVLA's preset configurations
- Provides optional Hub push functionality with safety flag
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| loss, output_dict = policy.forward(batch) | ||
|
|
||
| # Backward pass and optimization | ||
| loss.backward() |
Copilot
AI
Dec 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing gradient clipping before optimizer step. The SmolVLA config defines grad_clip_norm=10 (via get_optimizer_preset), but gradient clipping must be manually applied. Add torch.nn.utils.clip_grad_norm_(policy.parameters(), optimizer_config.grad_clip_norm) after loss.backward() and before optimizer.step().
| loss.backward() | |
| loss.backward() | |
| torch.nn.utils.clip_grad_norm_(policy.parameters(), optimizer_config.grad_clip_norm) |
| output_directory = Path("outputs/train/my_smolvla") | ||
| output_directory.mkdir(parents=True, exist_ok=True) | ||
|
|
||
| device = torch.device("cuda") # or "cuda" or "cpu" |
Copilot
AI
Dec 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says 'or "cuda" or "cpu"' but the device is already set to "cuda", making this redundant. This should either say 'or "mps" or "cpu"' to match the other examples, or the device should be set to "mps" to be consistent with act_training_example.py and diffusion_training_example.py which use torch.device("mps").
| device = torch.device("cuda") # or "cuda" or "cpu" | |
| device = torch.device("mps") # or "cuda" or "cpu" |
| # Optional: Push to Hugging Face Hub | ||
| # Uncomment and update with your Hugging Face username | ||
| push_to_hub = False # Set to True to push to Hub | ||
| hub_repo_id = "YOUR_HF_USERNAME/my_smolvla_so101" # Replace with your repo ID |
Copilot
AI
Dec 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent dataset naming. The dataset ID uses 'svla_so100_pickplace' but line 192's comment refers to 'my_smolvla_so101' and line 205 mentions 'SO101 robot'. The documentation at docs/source/smolvla.mdx:40 confirms the dataset is 'svla_so100_pickplace'. The comments should consistently use SO100 to match the dataset, or clarify if SO101 is intentionally different.
| print("Training complete! Next steps:") | ||
| print("1. Test the model with: examples/tutorial/smolvla/using_smolvla_example.py") | ||
| print(f"2. Update model_id in the script to: {output_directory}") | ||
| print("3. Deploy on your SO101 robot!") |
Copilot
AI
Dec 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent reference to robot type. The comment mentions 'SO101 robot' but the dataset being used is 'svla_so100_pickplace' (line 18). This should be 'SO100 robot' to match the dataset, or clarified if SO101 is a different robot model.
| print("3. Deploy on your SO101 robot!") | |
| print("3. Deploy on your SO100 robot!") |
| @@ -0,0 +1,206 @@ | |||
| from pathlib import Path | |||
Copilot
AI
Dec 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing module-level docstring. Other training examples in this repository (act_training_example.py, diffusion_training_example.py) include a docstring at the top that describes what the script demonstrates. Consider adding a similar docstring such as: """This script demonstrates how to train SmolVLA Policy on a real-world dataset."""
| dataset, | ||
| batch_size=batch_size, | ||
| shuffle=True, | ||
| pin_memory=device.type == "cuda", |
Copilot
AI
Dec 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent pin_memory condition. This script uses device.type == "cuda" but other training examples (act_training_example.py:64, diffusion_training_example.py:65) use device.type != "cpu". The latter is more inclusive as it also covers MPS devices. Consider changing to device.type != "cpu" for consistency.
| pin_memory=device.type == "cuda", | |
| pin_memory=device.type != "cpu", |
| rename_map = { | ||
| "observation.images.top": "observation.images.camera1", | ||
| "observation.images.wrist": "observation.images.camera2", | ||
| } |
Copilot
AI
Dec 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded rename_map is specific to the svla_so100_pickplace dataset and will not work with other datasets that have different camera keys. Consider adding a comment explaining this mapping is dataset-specific and may need adjustment, or checking if the pretrained model's camera keys match the dataset's keys before applying the rename.
| }, | ||
| ) | ||
| else: | ||
| print("Initializing new SmolVLA model from scratch...") |
Copilot
AI
Dec 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement is unreachable.
| hub_repo_id = "YOUR_HF_USERNAME/my_smolvla_so101" # Replace with your repo ID | ||
|
|
||
| if push_to_hub: | ||
| print(f"\nPushing model to Hugging Face Hub: {hub_repo_id}...") |
Copilot
AI
Dec 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement is unreachable.
What this does
Explain what this PR does. Feel free to tag your PR with the appropriate label(s).
This PR is to add an example script for training SmolVLA, similar to the existing
using_smolvla_example.pyscript.How it was tested
Explain/show how you tested your changes.
I ran the script on Google Colab after installing
lerobotand verified the training job started successfully.How to checkout & try? (for the reviewer)
train_smolvla.pyscript in/content!python train_smolvla.py.