Skip to content

Commit 1a25b35

Browse files
Fix README to work.
1 parent b0059f0 commit 1a25b35

12 files changed

+132
-180
lines changed

README.md

+48-39
Original file line numberDiff line numberDiff line change
@@ -27,69 +27,39 @@ There are 4 parts to the package:
2727

2828
2) **Complex Environment Wrappers**: Similar to the toy environment, this is parameterised by a `config` dict which contains all the information needed to inject the dimensions into Atari or Mujoco environments. Please see [`example.py`](example.py) for some simple examples of how to use these. The Atari wrapper is in [`mdp_playground/envs/gym_env_wrapper.py`](mdp_playground/envs/gym_env_wrapper.py) and the Mujoco wrapper is in [`mdp_playground/envs/mujoco_env_wrapper.py`](mdp_playground/envs/mujoco_env_wrapper.py).
2929

30-
3) **Experiments**: Experiments are launched using [`run_experiments.py`](run_experiments.py). Config files for experiments are located inside the [`experiments`](experiments) directory. Please read the [instructions](#running-experiments) below for details.
30+
3) **Experiments**: Experiments are launched using [`run_experiments.py`](run_experiments.py). Config files for experiments are located inside the [`experiments`](experiments) directory. Please read the [instructions](#running-experiments) below for details on how to launch experiments.
3131

3232
4) **Analysis**: [`plot_experiments.ipynb`](plot_experiments.ipynb) contains code to plot the standard plots from the paper.
3333

34-
## Installation
35-
36-
### Production use
37-
We recommend using `conda` to manage environments. After setup of the environment, you can install MDP Playground in two ways:
38-
#### Manual
39-
To install MDP Playground manually, clone the repository and run:
40-
```bash
41-
pip install -e .[extras]
42-
```
43-
This might be the preferred way if you want easy access to the included experiments.
4434

45-
#### From PyPI
46-
MDP Playground is also on PyPI. Just run:
47-
```bash
48-
pip install mdp_playground[extras]
49-
```
35+
## Running experiments from the main paper
36+
For reproducing experiments from the main paper, please continue reading.
5037

38+
For general instructions, please see [here](#installation).
5139

52-
### Reproducing results from the paper
53-
We recommend using `conda` environments to manage virtual `Python` environments to run the experiments. Unfortunately, you will have to maintain 2 environments - 1 for the "older" **discrete toy** experiments and 1 for the "newer" **continuous and complex** experiments from the paper. As mentioned in Appendix P in the paper, this is because of issues with Ray, the library that we used for our baseline agents.
40+
### Installation for running experiments from the main paper
41+
We recommend using `conda` environments to manage virtual `Python` environments to run the experiments. Unfortunately, you will have to maintain 2 environments - 1 for the "older" **discrete toy** experiments and 1 for the "newer" **continuous and complex** experiments from the paper. As mentioned in Appendix section **Tuned Hyperparameters** in the paper, this is because of issues with Ray, the library that we used for our baseline agents.
5442

5543
Please follow the following commands to install for the discrete toy experiments:
5644
```bash
5745
conda create -n py36_toy_rl_disc_toy python=3.6
5846
conda activate py36_toy_rl_disc_toy
5947
cd mdp-playground
48+
pip install -r requirements.txt
6049
pip install -e .[extras_disc]
6150
```
6251

63-
Please follow the following commands to install for the continuous and complex experiments:
52+
Please follow the following commands to install for the continuous and complex experiments. **IMPORTANT**: In case, you do not have MuJoCo, please ignore any mujoco-py related installation errors below:
6453
```bash
6554
conda create -n py36_toy_rl_cont_comp python=3.6
6655
conda activate py36_toy_rl_cont_comp
6756
cd mdp-playground
57+
pip install -r requirements.txt
6858
pip install -e .[extras_cont]
6959
wget 'https://ray-wheels.s3-us-west-2.amazonaws.com/master/8d0c1b5e068853bf748f72b1e60ec99d240932c6/ray-0.9.0.dev0-cp36-cp36m-manylinux1_x86_64.whl'
7060
pip install ray-0.9.0.dev0-cp36-cp36m-manylinux1_x86_64.whl[rllib,debug]
7161
```
7262

73-
## Running experiments
74-
For reproducing experiments from the main paper, please see [below](#running-experiments-from-the-main-paper).
75-
76-
For general instructions, please continue reading.
77-
78-
You can run experiments using:
79-
```
80-
run-mdpp-experiments -c <config_file> -e <exp_name> -n <config_num>
81-
```
82-
The `exp_name` is a prefix for the filenames of CSV files where stats for the experiments are recorded. The CSV stats files will be saved to the current directory.<br>
83-
Each of the command line arguments has defaults. Please refer to the documentation inside [`run_experiments.py`](run_experiments.py) for further details on the command line arguments. (Or run it with the `-h` flag to bring up help.)
84-
85-
The config files for experiments from the [paper](https://arxiv.org/abs/1909.07750) are in the experiments directory.<br>
86-
The name of the file corresponding to an experiment is formed as: `<algorithm_name>_<dimension_names>.py`<br>
87-
Some sample `algorithm_name`s are: `dqn`, `rainbow`, `a3c`, `a3c_lstm`, `ddpg`, `td3` and `sac`<br>
88-
Some sample `dimension_name`s are: `seq_del` (for **delay** and **sequence length** varied together), `p_r_noises` (for **P** and **R noises** varied together),
89-
`target_radius` (for varying **target radius**) and `time_unit` (for varying **time unit**)<br>
90-
For example, for algorithm **DQN** when varying dimensions **delay** and **sequence length**, the corresponding experiment file is [`dqn_seq_del.py`](experiments/dqn_seq_del.py)
91-
92-
## Running experiments from the main paper
9363
We list here the commands for the experiments from the main paper:
9464
```bash
9565
# Discrete toy environments:
@@ -108,6 +78,8 @@ python run_experiments.py -c experiments/ddpg_move_to_a_point_irr_dims.py -e ddp
10878
python run_experiments.py -c experiments/ddpg_move_to_a_point_p_order_2.py -e ddpg_move_to_a_point_p_order_2
10979

11080
# Complex environments:
81+
# The commands below run all configs serially.
82+
# In case, you want to parallelise on a cluster, please provide the CLI argument -n <config_number> at the end of the given commands. Please refer to the documentation for run_experiments.py for this.
11183
conda activate py36_toy_rl_cont_comp
11284
python run_experiments.py -c experiments/dqn_qbert_del.py -e dqn_qbert_del
11385
python run_experiments.py -c experiments/ddpg_halfcheetah_time_unit.py -e ddpg_halfcheetah_time_unit
@@ -121,6 +93,43 @@ python run_experiments.py -c experiments/ddpg_halfcheetah_time_unit.py -e ddpg_h
12193

12294
The CSV stats files will be saved to the current directory and can be analysed in [`plot_experiments.ipynb`](plot_experiments.ipynb).
12395

96+
97+
## Installation
98+
For reproducing experiments from the main paper, please see [here](#running-experiments-from-the-main-paper).
99+
100+
### Production use
101+
We recommend using `conda` to manage environments. After setup of the environment, you can install MDP Playground in two ways:
102+
#### Manual
103+
To install MDP Playground manually, clone the repository and run:
104+
```bash
105+
pip install -e .[extras]
106+
```
107+
This might be the preferred way if you want easy access to the included experiments.
108+
109+
#### From PyPI
110+
MDP Playground is also on PyPI. Just run:
111+
```bash
112+
pip install mdp_playground[extras]
113+
```
114+
115+
116+
## Running experiments
117+
You can run experiments using:
118+
```
119+
run-mdpp-experiments -c <config_file> -e <exp_name> -n <config_num>
120+
```
121+
The `exp_name` is a prefix for the filenames of CSV files where stats for the experiments are recorded. The CSV stats files will be saved to the current directory.<br>
122+
Each of the command line arguments has defaults. Please refer to the documentation inside [`run_experiments.py`](run_experiments.py) for further details on the command line arguments. (Or run it with the `-h` flag to bring up help.)
123+
124+
The config files for experiments from the [paper](https://arxiv.org/abs/1909.07750) are in the experiments directory.<br>
125+
The name of the file corresponding to an experiment is formed as: `<algorithm_name>_<dimension_names>.py`<br>
126+
Some sample `algorithm_name`s are: `dqn`, `rainbow`, `a3c`, `a3c_lstm`, `ddpg`, `td3` and `sac`<br>
127+
Some sample `dimension_name`s are: `seq_del` (for **delay** and **sequence length** varied together), `p_r_noises` (for **P** and **R noises** varied together),
128+
`target_radius` (for varying **target radius**) and `time_unit` (for varying **time unit**)<br>
129+
For example, for algorithm **DQN** when varying dimensions **delay** and **sequence length**, the corresponding experiment file is [`dqn_seq_del.py`](experiments/dqn_seq_del.py)
130+
131+
The CSV stats files will be saved to the current directory and can be analysed in [`plot_experiments.ipynb`](plot_experiments.ipynb).
132+
124133
## Plotting
125134
To plot results from experiments, run `jupyter-notebook` and open [`plot_experiments.ipynb`](plot_experiments.ipynb) in Jupyter. There are instructions within each of the cells on how to generate and save plots.
126135

experiments/ddpg_halfcheetah_action_max.py

+1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
from ray import tune
55
from collections import OrderedDict
66
num_seeds = 5
7+
timesteps_total = 3000000
78

89

910
var_env_configs = OrderedDict(

experiments/ddpg_halfcheetah_time_unit.py

+1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
from ray import tune
55
from collections import OrderedDict
66
num_seeds = 5
7+
timesteps_total = 3000000
78

89

910
var_env_configs = OrderedDict(

experiments/sac_halfcheetah_action_max.py

+1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
from ray import tune
55
from collections import OrderedDict
66
num_seeds = 5
7+
timesteps_total = 3000000
78

89

910
var_env_configs = OrderedDict(

experiments/sac_halfcheetah_time_unit.py

+1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
from ray import tune
55
from collections import OrderedDict
66
num_seeds = 5
7+
timesteps_total = 3000000
78

89

910
var_env_configs = OrderedDict(

experiments/sac_halfcheetah_time_unit_config_processor.py

+1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
from collections import OrderedDict
55
from mdp_playground.config_processor import *
66
num_seeds = 5
7+
timesteps_total = 3000000
78

89

910
var_env_configs = OrderedDict(

experiments/td3_halfcheetah_action_max.py

+1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
from ray import tune
55
from collections import OrderedDict
66
num_seeds = 5
7+
timesteps_total = 3000000
78

89

910
var_env_configs = OrderedDict(

experiments/td3_halfcheetah_time_unit.py

+1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
from ray import tune
55
from collections import OrderedDict
66
num_seeds = 5
7+
timesteps_total = 3000000
78

89

910
var_env_configs = OrderedDict(

mdp_playground/config_processor/config_processor.py

+69-54
Original file line numberDiff line numberDiff line change
@@ -86,8 +86,57 @@ def process_configs(
8686
*variable_configs, overwrite=False
8787
)
8888

89+
varying_configs = []
90+
separate_var_configs = []
91+
# ###IMP Currently num_configs has to be equal for all 3 cases below:
92+
# grid (i.e. var), random and sobol #TODO Not sure how to solve this #config
93+
# setup problem. Could take Cartesian product of all 3 but that may lead to
94+
# too many configs and Cartesian product of dicts is a pain.
95+
if "var_configs" in dir(config):
96+
separate_var_configs.append(
97+
get_list_of_varying_configs(config.var_configs, mode="grid")
98+
)
99+
if "sobol_configs" in dir(config):
100+
separate_var_configs.append(
101+
get_list_of_varying_configs(
102+
config.sobol_configs, mode="sobol", num_configs=config.num_configs
103+
)
104+
)
105+
if "random_configs" in dir(config):
106+
separate_var_configs.append(
107+
get_list_of_varying_configs(
108+
config.random_configs, mode="random", num_configs=config.num_configs
109+
)
110+
)
111+
# print("VARYING_CONFIGS:", varying_configs)
112+
113+
num_configs_ = max(
114+
[len(separate_var_configs[i]) for i in range(len(separate_var_configs))]
115+
)
116+
for i in range(num_configs_):
117+
to_combine = [
118+
separate_var_configs[j][i] for j in range(len(separate_var_configs))
119+
]
120+
# overwrite = False because the keys in different modes of
121+
# config generation need to be disjoint
122+
varying_configs.append(deepmerge_multiple_dicts(*to_combine, overwrite=False))
123+
124+
# #hack ####TODO Remove extra pre-processing done here and again below:
125+
pre_final_configs = combined_processing(
126+
config.env_config,
127+
config.agent_config,
128+
config.model_config,
129+
config.eval_config,
130+
varying_configs=copy.deepcopy(varying_configs),
131+
framework=framework,
132+
algorithm=config.algorithm,
133+
)
134+
135+
89136
if "timesteps_total" in dir(config):
90137
hacky_timesteps_total = config.timesteps_total # hack
138+
else:
139+
hacky_timesteps_total = pre_final_configs[-1]["timesteps_total"]
91140

92141
config_algorithm = config.algorithm # hack
93142
# sys.exit(0)
@@ -137,40 +186,6 @@ def process_configs(
137186
+ ". Available options are: ray and stable_baselines."
138187
)
139188

140-
varying_configs = []
141-
separate_var_configs = []
142-
# ###IMP Currently num_configs has to be equal for all 3 cases below:
143-
# grid (i.e. var), random and sobol #TODO Not sure how to solve this #config
144-
# setup problem. Could take Cartesian product of all 3 but that may lead to
145-
# too many configs and Cartesian product of dicts is a pain.
146-
if "var_configs" in dir(config):
147-
separate_var_configs.append(
148-
get_list_of_varying_configs(config.var_configs, mode="grid")
149-
)
150-
if "sobol_configs" in dir(config):
151-
separate_var_configs.append(
152-
get_list_of_varying_configs(
153-
config.sobol_configs, mode="sobol", num_configs=config.num_configs
154-
)
155-
)
156-
if "random_configs" in dir(config):
157-
separate_var_configs.append(
158-
get_list_of_varying_configs(
159-
config.random_configs, mode="random", num_configs=config.num_configs
160-
)
161-
)
162-
# print("VARYING_CONFIGS:", varying_configs)
163-
164-
num_configs_ = max(
165-
[len(separate_var_configs[i]) for i in range(len(separate_var_configs))]
166-
)
167-
for i in range(num_configs_):
168-
to_combine = [
169-
separate_var_configs[j][i] for j in range(len(separate_var_configs))
170-
]
171-
# overwrite = False because the keys in different modes of
172-
# config generation need to be disjoint
173-
varying_configs.append(deepmerge_multiple_dicts(*to_combine, overwrite=False))
174189

175190
# varying_configs is a list of dict of dicts with a specific structure.
176191
final_configs = combined_processing(
@@ -876,28 +891,28 @@ def combined_processing(*static_configs, varying_configs, framework="ray", algor
876891
"fcnet_activation": "relu",
877892
}
878893

879-
# TODO Find a better way to enforce these?? Especially problematic for TD3
880-
# because then more values for target_noise_clip are witten to CSVs than
881-
# actually used during HPO but for normal (non-HPO) runs this needs to be
882-
# not done.
883-
if (algorithm == "DDPG"):
884-
if key == "critic_lr":
885-
final_configs[i]["actor_lr"] = value
886-
if key == "critic_hiddens":
887-
final_configs[i]["actor_hiddens"] = value
888-
if algorithm == "TD3":
889-
if key == "target_noise_clip_relative":
890-
final_configs[i]["target_noise_clip"] = (
891-
final_configs[i]["target_noise_clip_relative"]
892-
* final_configs[i]["target_noise"]
893-
)
894-
del final_configs[i][
895-
"target_noise_clip_relative"
896-
] # hack have to delete it otherwise Ray will crash for unknown config param.
894+
# TODO Find a better way to enforce these?? Especially problematic for TD3
895+
# because then more values for target_noise_clip are witten to CSVs than
896+
# actually used during HPO but for normal (non-HPO) runs this needs to be
897+
# not done.
898+
if (algorithm == "DDPG"):
899+
if key == "critic_lr":
900+
final_configs[i]["actor_lr"] = value
901+
if key == "critic_hiddens":
902+
final_configs[i]["actor_hiddens"] = value
903+
if algorithm == "TD3":
904+
if key == "target_noise_clip_relative":
905+
final_configs[i]["target_noise_clip"] = (
906+
final_configs[i]["target_noise_clip_relative"]
907+
* final_configs[i]["target_noise"]
908+
)
909+
del final_configs[i][
910+
"target_noise_clip_relative"
911+
] # hack have to delete it otherwise Ray will crash for unknown config param.
897912

898-
elif key == "model":
913+
if key == "model":
899914
for key_2 in final_configs[i][key]:
900-
if key_2 == "use_lstm":
915+
if key_2 == "use_lstm" and final_configs[i][key][key_2]:
901916
final_configs[i][key]["max_seq_len"] = (
902917
final_configs[i]["env_config"]["delay"]
903918
+ final_configs[i]["env_config"]["sequence_length"]

0 commit comments

Comments
 (0)