Skip to content

Commit 1abaae4

Browse files
committed
Q-Learning docstring improvements.
1 parent b7b4d3d commit 1abaae4

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

TD/Q-Learning Solution.ipynb

+2-2
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@
5050
" Args:\n",
5151
" Q: A dictionary that maps from state -> action-values.\n",
5252
" Each value is a numpy array of length nA (see below)\n",
53-
" epsilon: The probability to select a random action . float between 0 and 1.\n",
53+
" epsilon: The probability to select a random action. Float between 0 and 1.\n",
5454
" nA: Number of actions in the environment.\n",
5555
" \n",
5656
" Returns:\n",
@@ -82,7 +82,7 @@
8282
" num_episodes: Number of episodes to run for.\n",
8383
" discount_factor: Gamma discount factor.\n",
8484
" alpha: TD learning rate.\n",
85-
" epsilon: Chance the sample a random action. Float betwen 0 and 1.\n",
85+
" epsilon: Chance to sample a random action. Float between 0 and 1.\n",
8686
" \n",
8787
" Returns:\n",
8888
" A tuple (Q, episode_lengths).\n",

TD/Q-Learning.ipynb

+2-2
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@
4949
" Args:\n",
5050
" Q: A dictionary that maps from state -> action-values.\n",
5151
" Each value is a numpy array of length nA (see below)\n",
52-
" epsilon: The probability to select a random action . float between 0 and 1.\n",
52+
" epsilon: The probability to select a random action. Float between 0 and 1.\n",
5353
" nA: Number of actions in the environment.\n",
5454
" \n",
5555
" Returns:\n",
@@ -81,7 +81,7 @@
8181
" num_episodes: Number of episodes to run for.\n",
8282
" discount_factor: Gamma discount factor.\n",
8383
" alpha: TD learning rate.\n",
84-
" epsilon: Chance the sample a random action. Float betwen 0 and 1.\n",
84+
" epsilon: Chance to sample a random action. Float between 0 and 1.\n",
8585
" \n",
8686
" Returns:\n",
8787
" A tuple (Q, episode_lengths).\n",

0 commit comments

Comments
 (0)