Skip to content

Commit 2c3b5ab

Browse files
authored
Update Policy_gradient_breakout.py
1 parent 4e35946 commit 2c3b5ab

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

Diff for: Policy_gradient_breakout.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@
3030
than correct output is [0, 0, 1] where third column corresponds to action 3.
3131
4. Save the data: state, the 'correct' output and the reward you've got.
3232
33-
This is what you do after each frame. Now do this for a while, a few
34-
(or one, however you like) games to collect enough data for training.
33+
This is what you do after each frame. Now do this for a while (a few
34+
or one game, however you like) to collect enough data for one training iteration.
3535
3636
After you've collected enough data, do one iteration of training:
3737
1. Construct gradient loss vector. This step is the core of gradient decent method.
@@ -52,7 +52,7 @@
5252
"""
5353

5454
# functions
55-
def prep_observation(observation, zeros_and_ones):
55+
def prep_observation(observation, zeros_and_ones=False):
5656
obs_2d = observation[:, :, 0] # from RGB to R
5757
obs_2d_cut = obs_2d[93:193, 8:152] # Specific to Breakout: whole space 33:193, 8:152 ; not including bricks 93:193, 8:152
5858
obs_2d_cut_ds = obs_2d_cut[::2, ::2] # downsample by 2: a b c d e f d -> to -> a c e d
@@ -103,7 +103,7 @@ def plot_pixels(observation): # this one is used to plot how input to nn looks l
103103
reward_shift = 10 # to account for lagging reward (frames)
104104
reward_discount = 0.99
105105
obs_discount = 0.8 # discount last frame to account for velocity of objects (used in running_frame)
106-
training_batch_size = 5 # number of games to perform one optimization step
106+
training_batch_size = 5 # number of games to play before performing one optimization step
107107
neurons = 32 # single hidden layer with that many neurons
108108
input_size = 3600 # hand written number indicating number of pixels fed into nn
109109
actions = [0, 1, 2] # true actions are 1, 2, 3 this is used for indexing

0 commit comments

Comments
 (0)