Skip to content

Commit 87b1484

Browse files
Merge branch 'experimental' into mdpp_plots
2 parents 558c6dd + 7694f98 commit 87b1484

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+1337
-1093
lines changed

.gitmodules

Lines changed: 0 additions & 3 deletions
This file was deleted.

README.md

Lines changed: 37 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,31 @@
11
# MDP Playground
2-
A python package to benchmark low-level dimensions of difficulties for RL agents.
2+
A python package to inject low-level dimensions of difficulties in RL environments. There are toy environments to design and debug RL agents. And complex environment wrappers for Atari and Mujoco to test robustness to these dimensions in complex environments.
33

44
## Getting started
5-
There are 3 parts to the package:
6-
1) **Environments**: The base Environment in [`mdp_playground/envs/rl_toy_env.py`](mdp_playground/envs/rl_toy_env.py) implements all the functionality, including discrete and continuous environments, and is parameterised by a `config` dict which contains all the information needed to instantiate the required MDP. Please see [`example.py`](example.py) for some simple examples of how to use the MDP environments in the package. For further details, please refer to the documentation in [`mdp_playground/envs/rl_toy_env.py`](mdp_playground/envs/rl_toy_env.py).
5+
There are 4 parts to the package:
6+
1) **Toy Environments**: The base toy Environment in [`mdp_playground/envs/rl_toy_env.py`](mdp_playground/envs/rl_toy_env.py) implements the toy environment functionality, including discrete and continuous environments, and is parameterised by a `config` dict which contains all the information needed to instantiate the required MDP. Please see [`example.py`](example.py) for some simple examples of how to use the MDP environments in the package. For further details, please refer to the documentation in [`mdp_playground/envs/rl_toy_env.py`](mdp_playground/envs/rl_toy_env.py).
77

8-
2) **Experiments**: Experiments are launched using [`run_experiments.py`](run_experiments.py). Config files for experiments are located inside the [`experiments`](experiments) directory. Please read the [instructions](#running-experiments) below for details.
8+
2) **Complex Environment Wrappers**: Similar to the toy environment, this is parameterised by a `config` dict which contains all the information needed to inject the dimensions into Atari or Mujoco environments. Please see [`example.py`](example.py) for some simple examples of how to use these. The Atari wrapper is in [`mdp_playground/envs/gym_env_wrapper.py`](mdp_playground/envs/gym_env_wrapper.py) and the Mujoco wrapper is in [`mdp_playground/envs/mujoco_env_wrapper.py`](mdp_playground/envs/mujoco_env_wrapper.py).
99

10-
3) **Analysis**: [`plot_experiments.ipynb`](plot_experiments.ipynb) contains code to plot the standard plots from the paper.
10+
3) **Experiments**: Experiments are launched using [`run_experiments.py`](run_experiments.py). Config files for experiments are located inside the [`experiments`](experiments) directory. Please read the [instructions](#running-experiments) below for details.
1111

12-
## Installation
13-
**IMPORTANT**
12+
4) **Analysis**: [`plot_experiments.ipynb`](plot_experiments.ipynb) contains code to plot the standard plots from the paper.
1413

15-
We recommend using `conda` environments to manage virtual `Python` environments to run the experiments. Unfortunately, you will have to maintain 2 environments - 1 for **discrete** experiments and 1 for **continuous** experiments from the paper. As mentioned in Appendix H in the paper, this is because of issues with Ray, the library that we used for our baseline agents.
14+
## Installation
15+
We recommend using `conda` environments to manage virtual `Python` environments to run the experiments. Unfortunately, you will have to maintain 2 environments - 1 for the "older" **discrete toy** experiments and 1 for the "newer" **continuous and complex** experiments from the paper. As mentioned in Appendix P in the paper, this is because of issues with Ray, the library that we used for our baseline agents.
1616

17-
Please follow the following commands to install for the discrete experiments:
17+
Please follow the following commands to install for the discrete toy experiments:
1818
```
19-
conda create -n py36_toy_rl_disc python=3.6
20-
conda activate py36_toy_rl_disc
19+
conda create -n py36_toy_rl_disc_toy python=3.6
20+
conda activate py36_toy_rl_disc_toy
2121
cd mdp-playground
2222
pip install -e .[extras_disc]
2323
```
2424

25-
Please follow the following commands to install for the continuous experiments:
25+
Please follow the following commands to install for the continuous and complex experiments:
2626
```
27-
conda create -n py36_toy_rl_cont python=3.6
28-
conda activate py36_toy_rl_cont
27+
conda create -n py36_toy_rl_cont_comp python=3.6
28+
conda activate py36_toy_rl_cont_comp
2929
cd mdp-playground
3030
pip install -e .[extras_cont]
3131
wget 'https://ray-wheels.s3-us-west-2.amazonaws.com/master/8d0c1b5e068853bf748f72b1e60ec99d240932c6/ray-0.9.0.dev0-cp36-cp36m-manylinux1_x86_64.whl'
@@ -45,36 +45,42 @@ The `exp_name` is a prefix for the filenames of CSV files where stats for the ex
4545
Each of the command line arguments has defaults. Please refer to the documentation inside [`run_experiments.py`](run_experiments.py) for further details on the command line arguments. (Or run it with the `-h` flag to bring up help.)
4646

4747
The config files for experiments from the [paper](https://arxiv.org/abs/1909.07750) are in the experiments directory.<br>
48-
The name of the file corresponding to an experiment is formed as: `<algorithm_name>_<meta_feature_names>.py`<br>
48+
The name of the file corresponding to an experiment is formed as: `<algorithm_name>_<dimension_names>.py`<br>
4949
Some sample `algorithm_name`s are: `dqn`, `rainbow`, `a3c`, `a3c_lstm`, `ddpg`, `td3` and `sac`<br>
50-
Some sample `meta_feature_name`s are: `seq_del` (for **delay** and **sequence length** varied together), `p_r_noises` (for **P** and **R noises** varied together),
50+
Some sample `dimension_name`s are: `seq_del` (for **delay** and **sequence length** varied together), `p_r_noises` (for **P** and **R noises** varied together),
5151
`target_radius` (for varying **target radius**) and `time_unit` (for varying **time unit**)<br>
52-
For example, for algorithm **DQN** when varying meta-features **delay** and **sequence length**, the corresponding experiment file is [`dqn_seq_del.py`](experiments/dqn_seq_del.py)
52+
For example, for algorithm **DQN** when varying dimensions **delay** and **sequence length**, the corresponding experiment file is [`dqn_seq_del.py`](experiments/dqn_seq_del.py)
5353

5454
## Running experiments from the main paper
55-
For completeness, we list here the commands for the experiments from the main paper:
55+
We list here the commands for the experiments from the main paper:
5656
```
57-
# Discrete environments: (Figures 1 and 2)
58-
# We varied delay and sequence lengths together
59-
conda activate py36_toy_rl_disc
60-
python run_experiments.py -c experiments/dqn_seq_del.py -e dqn_seq_del
61-
python run_experiments.py -c experiments/rainbow_seq_del.py -e rainbow_seq_del
62-
python run_experiments.py -c experiments/a3c_seq_del.py -e a3c_seq_del
63-
python run_experiments.py -c experiments/a3c_lstm_seq_del.py -e a3c_lstm_seq_del
64-
65-
# Representation learning: (Figure 3)
57+
# Discrete toy environments:
58+
# Image representation experiments:
59+
conda activate py36_toy_rl_disc_toy
6660
python run_experiments.py -c experiments/dqn_image_representations.py -e dqn_image_representations
6761
python run_experiments.py -c experiments/rainbow_image_representations.py -e rainbow_image_representations
6862
python run_experiments.py -c experiments/a3c_image_representations.py -e a3c_image_representations
6963
python run_experiments.py -c experiments/dqn_image_representations_sh_quant.py -e dqn_image_representations_sh_quant
7064
71-
# Continuous environments: (Figure 4)
72-
conda activate py36_toy_rl_cont
73-
python run_experiments.py -c experiments/ddpg_move_to_a_point_target_radius.py -e ddpg_move_to_a_point_target_radius
74-
python run_experiments.py -c experiments/ddpg_move_to_a_point_action_max.py -e ddpg_move_to_a_point_action_max
65+
# Continuous toy environments:
66+
conda activate py36_toy_rl_cont_comp
7567
python run_experiments.py -c experiments/ddpg_move_to_a_point_time_unit.py -e ddpg_move_to_a_point_time_unit
7668
python run_experiments.py -c experiments/ddpg_move_to_a_point_irr_dims.py -e ddpg_move_to_a_point_irr_dims
69+
# Varying the action range and time unit together for transition_dynamics_order = 2
70+
python run_experiments.py -c experiments/ddpg_move_to_a_point_p_order_2.py -e ddpg_move_to_a_point_p_order_2
71+
72+
# Complex environments:
73+
conda activate py36_toy_rl_cont_comp
74+
python run_experiments.py -c experiments/dqn_qbert_del.py -e dqn_qbert_del
75+
python run_experiments.py -c experiments/ddpg_halfcheetah_time_unit.py -e ddpg_halfcheetah_time_unit
76+
77+
# For the spider plots, experiments for all the agents and dimensions will need to be run from the experiments directory, i.e., for discrete: dqn_p_r_noises.py, a3c_p_r_noises, ..., dqn_seq_del, ..., dqn_sparsity, ..., dqn_image_representations, ...
78+
# for continuous:, ddpg_move_to_a_point_p_noise, td3_move_to_a_point_p_noise, ..., ddpg_move_to_a_point_r_noise, ..., ddpg_move_to_a_point_irr_dims, ..., ddpg_move_to_a_point_action_loss_weight, ..., ddpg_move_to_a_point_action_max, ..., ddpg_move_to_a_point_target_radius, ..., ddpg_move_to_a_point_time_unit
79+
# and then follow the instructions in plot_experiments.ipynb
80+
81+
# For the bsuite debugging experiment, please run the bsuite sonnet dqn agent on our toy environment while varying reward density. Commit https://github.com/deepmind/bsuite/commit/5116216b62ce0005100a6036fb5397e358652530 should work fine.
7782
```
83+
7884
The CSV stats files will be saved to the current directory and can be analysed in [`plot_experiments.ipynb`](plot_experiments.ipynb).
7985

8086
## Plotting

0 commit comments

Comments
 (0)