Skip to content

Commit 3f52485

Browse files
committed
modify readme
1 parent 5e9ba41 commit 3f52485

File tree

1 file changed

+11
-6
lines changed

1 file changed

+11
-6
lines changed

examples/reinforcement_learning/README.md

+11-6
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,8 @@ For each tutorial, open a terminal and run:
4646

4747
The tutorial algorithms follow the same basic structure, as shown in file: [`./tutorial_format.py`](https://github.com/tensorlayer/tensorlayer/blob/reinforcement-learning/examples/reinforcement_learning/tutorial_format.py)
4848

49+
The pretrained models for each algorithm are stored [here](https://github.com/tensorlayer/pretrained-models). You can download the models and load the weights in the policies for tests.
50+
4951
## Table of Contents:
5052
### value-based
5153
| Algorithms | Action Space | Tutorial Env | Papers |
@@ -123,18 +125,19 @@ The tutorial algorithms follow the same basic structure, as shown in file: [`./t
123125

124126
```
125127
We implement Double DQN, Dueling DQN and Noisy DQN here.
126-
128+
127129
-The max operator in standard DQN uses the same values both to select and to evaluate an action by:
128-
130+
129131
Q(s_t, a_t) = R\_{t+1\} + gamma \* max\_{a}Q\_\{target\}(s_{t+1}, a).
130-
132+
131133
-Double DQN proposes to use following evaluation to address overestimation problem of max operator:
132-
134+
133135
Q(s_t, a_t) = R\_{t+1\} + gamma \* Q\_{target}(s\_\{t+1\}, max{a}Q(s_{t+1}, a)).
134-
136+
135137
-Dueling DQN uses dueling architecture where the value of state and the advantage of each action is estimated separately.
136-
138+
137139
-Noisy DQN propose to explore by adding parameter noises.
140+
```
138141

139142

140143
```
@@ -339,3 +342,5 @@ Our env wrapper: `./tutorial_wrappers.py`
339342
- @Tokarev-TT-33 Tianyang Yu @initial-h Hongming Zhang : PG, DDPG, PPO, DPPO, TRPO
340343
- @Officium Yanhua Huang: C51, DQN_variants, prioritized_replay, wrappers.
341344
345+
346+
```

0 commit comments

Comments
 (0)