Demystifying the Mechanisms Behind Emergent Exploration in Goal-conditioned RL

This repository contains the code for the paper
“Demystifying the Mechanisms Behind Emergent Exploration in Goal-conditioned RL.” This paper investigates the exploration dynamics of SGCRL (Singl-Goal Contrastive Reinforcement Learning) through a combination of controlled experiments and theoretical analysis. Refer to the project website for more information.

Overview

This codebase includes two main components:

1. Tabular SGCRL Implementation

Implements SGCRL in a tabular (non-neural) setting.
Enables studying the exploration behavior of SGCRL without neural network function approximation.
Useful for running controlled experiments that isolate algorithmic dynamics.
See tabular_maze.ipynb and tabular_hanoi.ipynb to run tabular SGCRL in a FourRooms maze and Towers of Hanoi environment.

2. Continuous Environment Implementation

Based on the original SGCRL repository.
Extends it with functionality for defining and enforcing safety regions by manipulating the contrastive representations — safety regions are parts of the environment that the agent should avoid during training and evaluation.

Set up conda environment

Set up conda:

Load up anaconda: module load anaconda3
Clone the repository
Create an Anaconda environment: conda create -n contrastive_rl python=3.9 -y
Activate the environment: conda activate contrastive_rl

Install package dependencies:

Change library path: export LD_LIBRARY_PATH={path to conda}/.conda/envs/contrastive_rl/lib/
Install the requirements: pip install -r requirements.txt --no-deps
Download the mujoco binaries and place them in ~/.mujoco/ according to instructions in https://github.com/openai/mujoco-py. Run export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:{path to mujoco}/.mujoco/mujoco210/bin
Reinstall strict versions for the following packages:

pip install dm-acme[jax,tf] 
pip install jax==0.4.10 jaxlib==0.4.10
pip install ml_dtypes==0.2.0
pip install dm-haiku==0.0.9
pip install gymnasium-robotics 
pip uninstall scipy; pip install scipy==1.12
pip install torch==2.1.2 scikit-learn pandas

Potential errors and fixes: Cythonizing Error:

fatal error: GL/glew.h: No such file or directory 4 | #include <GL/glew.h>

Fix:

conda install -c conda-forge glew
conda install -c conda-forge mesalib
conda install -c menpo glfw3
pip install patchelf

Cythinizing Error:

Cannot assign type 'void (const char *) except * nogil' to 'void (*)(const char *) noexcept nogil'

Fix: pip install "cython<3"

Running with GPU:

To enable GPU running, run these three commands in a shell with gpu access. This essentially picks out a set of gpu backend infrastructures that is simultaneously supported by jax and the repository code. Note that this step may vary depending on the specifics of the computing environment.

module load cudatoolkit/11.3 cudnn/cuda-11.x/8.2.0
pip install optax==0.1.7
pip install --upgrade jax==0.4.7 jaxlib==0.4.7+cuda11.cudnn82 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/{path to cuda}/lib64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia

To run code, use

python lp_contrastive.py

Safety Experiment

To run the safety experiment, execute the following command:

python -u lp_contrastive.py \
    --env point_FourRooms \
    --region_bounds=0,5:5,11

This will run the safety experiment, where the agent learns to avoid the top-right corner of the FourRooms environment. During training, the visitation data will be saved in the folder:

./safety_region_visits_data/

Refer to the paper for detailed visualizations and analysis of the agent's behavior.

Useful flags

--env='{ENV_NAME}': Specifies the environment. Default environment is 'sawyer_bin'. Currently supported environments include 'sawyer_bin', 'sawyer_box', 'sawyer_peg', 'point_Spiral11x11'.
--alg='{ALG_NAME}': Specifies the algorithm. Default algorithm is 'contrastive_cpc'. Currently supported algorithms include 'contrastive_nce', 'contrastive_cpc', 'c_learning', 'nce+c_learning'.
--num_steps=12_000_000: Specifies the maximum number of actor steps.
--sample_goals: Turning on this flag will make the agent collect data conditioned on goals uniformly sampled according to the environment. (This behavior corresponds to that of the original Contrastive RL algorithm (Eysenbach et. al, 2022).
--add_uid: Randomly generates unique uid and saves checkpoints and logs inside of a directory with that uid as the name.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
contrastive		contrastive
.gitignore		.gitignore
README.md		README.md
default.py		default.py
distributional.py		distributional.py
env_utils.py		env_utils.py
hanoi_env.py		hanoi_env.py
lp_contrastive.py		lp_contrastive.py
point_env.py		point_env.py
requirements.txt		requirements.txt
run_lp_contrastive.sh		run_lp_contrastive.sh
tabular_hanoi.ipynb		tabular_hanoi.ipynb
tabular_maze.ipynb		tabular_maze.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Demystifying the Mechanisms Behind Emergent Exploration in Goal-conditioned RL

Overview

1. Tabular SGCRL Implementation

2. Continuous Environment Implementation

Set up conda environment

Safety Experiment

Useful flags

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Demystifying the Mechanisms Behind Emergent Exploration in Goal-conditioned RL

Overview

1. Tabular SGCRL Implementation

2. Continuous Environment Implementation

Set up conda environment

Safety Experiment

Useful flags

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages