[RL-baseline] Model v5, experiment #1 #43
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Action set #0 was chosen for this experiment:
[0.0, 0.0, 0.0], # no action
[0.0, 0.8, 0.0], # throttle
[0.0, 0.3, 0.0], # throttle
[0.0, 0.0, 0.6], # break
[0.0, 0.0, 0.2], # break
[-0.9, 0.0, 0.0], # left
[-0.5, 0.0, 0.0], # left
[-0.2, 0.0, 0.0], # left
[0.9, 0.0, 0.0], # right
[0.5, 0.0, 0.0], # right
[0.2, 0.0, 0.0], # right
Loss, Entropy and Running Reward were very low between the 10k and 15k episode marks but the model managed to overcome it and ended up with a maximum running reward of 528, which is also the final running reward at the end of the 20k episodes. It's likely that the network could keep improving if we run it for a few more episodes.
Tensorboard captures below:
Sample video below. As with all samples before, the model hasn't managed to learn to break before a sharp turn, but in this example the road appears again by sheer luck and managed to get back in.
https://user-images.githubusercontent.com/1465235/113404852-2128a500-93a9-11eb-8881-05101c3cd0e0.mp4