This project will no longer be maintained by Intel.
Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.
Intel no longer accepts patches to this project.
Codebase for Collaborative Evolutionary Reinforcement Learning accepted to be published in the Proceedings of the 36th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s).
Setup Conda
- Install Anaconda3
- conda create -n
$ENV_NAME$ python=3.6.1 - source activate
Install Pytorch version 1.0
- Refer to for instructions
- conda install pytorch torchvision -c pytorch [GPU-version]
Install Numpy, Cython and Scipy
- pip install numpy==1.15.4
- pip install cython==0.29.2
- pip install scipy==1.1.0
Install Mujoco and OpenAI_Gym
- Download mjpro150 from
- Unzip mjpro150 and place it + mjkey.txt (license file) in ~/.mujoco/ (create the .mujoco dir in you home folder)
- pip install -U 'mujoco-py<1.50.2,>=1.50.1'
- pip install 'gym[all]' Main Script runs everything
core/ Rollout worker
core/ Upper Confidence Bound implemented for learner selection by the resource-manager
core/ Portfolio of learners which can vary in their hyperparameters
core/ Learner agent encapsulating the algo and sum-statistics
core/ Cyclic Replay buffer
core/ Wrapper around the Mujoco env
core/ Actor/Critic model
core/ Implements Neuroevolution
core/ Implements the off_policy_gradient learner TD3
core/ Helper functions
python -env HalfCheetah-v2 -portfolio {10,14} -total_steps 2 -seed {2018,2022}
python -env Hopper-v2 -portfolio {10,14} -total_steps 1.5 -seed {2018,2022}
python -env Humanoid-v2 -portfolio {10,14} -total_steps 1 -seed {2018,2022}
python -env Walker2d-v2 -portfolio {10,14} -total_steps 2 -seed {2018,2022}
python -env Swimmer-v2 -portfolio {10,14} -total_steps 2 -seed {2018,2022}
python -env Hopper-v2 -portfolio {100,102} -total_steps 5 -seed {2018,2022}
where {} represents an inclusive discrete range: {10, 14} --> {10, 11, 12, 13, 14}
All roll-outs (evaluation of actors in the evolutionary population and the explorative roll-outs conducted by the learners run in parallel). They are farmed out to different CPU cores, and write asynchronously to the collective replay buffer. Thus, slight variations in results are observed even with the same seed.