Merge pull request dennybritz#102 from sstarzycki/patch-1

dennybritz · web-flow · commit 762f34c98f6c · 2017-07-21T08:20:55.000+02:00
Update description of env.P[s][a]
diff --git a/DP/Policy Evaluation.ipynb b/DP/Policy Evaluation.ipynb
@@ -41,7 +41,7 @@
     "    Args:\n",
     "        policy: [S, A] shaped matrix representing the policy.\n",
     "        env: OpenAI env. env.P represents the transition probabilities of the environment.\n",
-    "            env.P[s][a] is a (prob, next_state, reward, done) tuple.\n",
+    "            env.P[s][a] is a list of transition tuples (prob, next_state, reward, done).\n",
     "        theta: We stop evaluation once our value function change is less than theta for all states.\n",
     "        discount_factor: gamma discount factor.\n",
     "    \n",