Skip to content

RajGhugare19/builderbench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BuilderBench

Can AI models build a world which today's generative models can only dream of?

image

BuilderBench is a benchmark designed to facilitate research on open-ended exploration, embodied reasoning and reinforcement learning (RL). Features include:

  • A parallelizable and hardware-accelerated simulator built using MuJoCo and Jax. Training a PPO policy to pick and place a block takes less than 5 minutes on one GPU and twelve CPU threads.
  • A task-suite of 42 ($\times$ 4 variations) tasks, where each task requires qualitatively different reasoning capabilities.
  • Single file implementations for two self-supervised RL and four RL algorithms in jax.

For more details, check out the project website and research paper.

Installation

We have tested the installation on Ubuntu 22.04 and Ubuntu 24.04 using python 3.10.

From source

Clone the repository and enter the main folder.

The main dependencies for BuilderBench environments is mujoco, jax, and optax. For installing the BuilderBench environments:

pip install -e .

For using reference implementations or developing new algorithms:

pip install -e ".[all]"

Environments and Tasks

The environment consists of a robot hand that can navigate in 3D space and interact with a set of cube shaped blocks. A task corresponds to a physically stable target structure built using cubes. Tasks are specified using the positions of the blocks in the target structure. A central insight of builderbench is that despite this seemingly simple setup, tasks can be arbitrarily complex and long-horizon and can require multiple steps of high-level reasoning. The builderbench task suite consists of over 40 such carefully curated tasks. Check out the project website for visualizations and the list of tasks. All tasks are defined in the create_task_data.py file.

Simulator

The step function of the environment is parallelized using multi-thread pooling implemented by MuJoCo's rollout functionality in C++. The rest of the environment code is written in jax in a jit friendly manner. The rollout function is used as a jax callback and the environments can be compiled end to end on jax, enjoying the benefits of jit and vmap. For instance, a PPO policy can be trained in less than 5 minutes to successfully pick and place a cube.

Advantages of using MuJoCo Rollout

  • Rollout uses MuJoCo's native simulation code written in C/C++. This circumvents issues faced by MuJoCo MJX when running scenes with many contacts. This is true in the case of building with large number of blocks.
  • MuJoCo Warp allows scaling MuJoCo GPU simulation to much larger scenes. The main advantage of using MuJoCo Warp would be to run the environment completely on GPU and make the entire training loop simpler. We have combined BuilderBench with MuJoCo Warp in the warp branch. Currently, accurately simulating scenes and training using the warp backend is 5 times slower than rollout. This is for two main reasons. First, Warp is still in a beta release and some features have not been implemented (for example, the no slip solver). Second, we have not yet been able to tune the XML parameters to ensure training is both fast and accurate. Reach out if you want to collaborate to make this happen. The warp backend will become default once it is equally fast and if we are able to manually solve all tasks in the BuilderBench task-suite using it.

Running experiments

image

To evaluate open-ended exploration, embodied reasoning and generalization, we design the self-supervised protocol. As shown in the figure, in this protocol agents have to explore the environment in a self-supervised manner and learn policies that can solve unseen tasks at test time. We also provide a debug single-task supervised protocol meant to provide additional feedback for researchers. In this protocol, agents are trained and tested on the same task.

Self-supervised protocol

Use the following command to run the MEGA algorithm in an environment with two cube. The policy will be evaluated in fixed intervals, on all tasks in the task-suite that correspond to two cubes.

cd impls
python play_ppo_mega.py --env_id=cube-2-play

Supervised protocol

Use the following command to run the PPO algorithm on the first task in an environment with one cube. The policy will be evaluated in fixed intervals on the same task.

cd impls
python ppo.py --env_id=cube-1-task1

Visualization

By default, training runs will store checkpoints at regular intervals in a impls/checkpoint/ folder. To visualize how these checkpoints perform, we have provided code in impls/video.py. This file will iterate over all the training runs present in the given folder (impls/checkpoint/ by default) and record and save a video for all the checkpoints of every training run. The code uses PPO's checkpoints as an example, but other algorithms can be visualized similarly.

Code Structure

The core structure of the codebase is as follows:

  • builderbench/
    • assets/ assets for defining MuJoCo models
    • tasks/ meta-data for all tasks
    • xmls/ xml files for defining MuJoCo models
    • constants.py predefined constants used for the environment
    • create_task_data.py task definition and task data creation
    • build_block.py supervised singletask protocol environment definition
    • build_block_play.py self-supervised multitask protocol environment definition
    • env_utils.py environment utilities.
  • impls/

Acknowledgements

  1. MuJoCo Playground for environment structuring.
  2. MuJoCo for the multithreading rollout functionality.
  3. MuJoCo Menagerie for the robot hand model.
  4. Brax for reference proximal policy optimization (ppo) implementation.
  5. JaxGCRL for reference contrastive RL implementation.

Citation

@misc{ghugare2025builderbench,
      title={BuilderBench -- A benchmark for generalist agents}, 
      author={Raj Ghugare and Catherine Ji and Kathryn Wantlin and Jin Schofield and Benjamin Eysenbach},
      year={2025},
      eprint={2510.06288},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2510.06288}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages