modify readme

quantumiracle · quantumiracle · commit 3f52485096c0 · 2020-02-08T08:33:46.000Z
diff --git a/examples/reinforcement_learning/README.md b/examples/reinforcement_learning/README.md
@@ -46,6 +46,8 @@ For each tutorial, open a terminal and run:
 
 The tutorial algorithms follow the same basic structure, as shown in file: [`./tutorial_format.py`](https://github.com/tensorlayer/tensorlayer/blob/reinforcement-learning/examples/reinforcement_learning/tutorial_format.py)
 
+The pretrained models for each algorithm are stored [here](https://github.com/tensorlayer/pretrained-models). You can download the models and load the weights in the policies for tests.
+
 ## Table of Contents:
 ### value-based
 | Algorithms      | Action Space | Tutorial Env   | Papers |
@@ -123,18 +125,19 @@ The tutorial algorithms follow the same basic structure, as shown in file: [`./t
 
   ```
   We implement Double DQN, Dueling DQN and Noisy DQN here.
-
+  
   -The max operator in standard DQN uses the same values both to select and to evaluate an action by:
-
+  
      Q(s_t, a_t) = R\_{t+1\} + gamma \* max\_{a}Q\_\{target\}(s_{t+1}, a).
-
+  
   -Double DQN proposes to use following evaluation to address overestimation problem of max operator:
-
+  
      Q(s_t, a_t) = R\_{t+1\} + gamma \* Q\_{target}(s\_\{t+1\}, max{a}Q(s_{t+1}, a)).
-
+  
   -Dueling DQN uses dueling architecture where the value of state and the advantage of each action is estimated separately.
-
+  
   -Noisy DQN propose to explore by adding parameter noises.
+  ```
 
 
   ```
@@ -339,3 +342,5 @@ Our env wrapper: `./tutorial_wrappers.py`
 - @Tokarev-TT-33 Tianyang Yu @initial-h Hongming Zhang : PG, DDPG, PPO, DPPO, TRPO
 - @Officium Yanhua Huang: C51, DQN_variants, prioritized_replay, wrappers.
 
+
+  ```