|
120 | 120 | "!pip install pyglet"
|
121 | 121 | ]
|
122 | 122 | },
|
| 123 | + { |
| 124 | + "cell_type": "code", |
| 125 | + "execution_count": null, |
| 126 | + "metadata": { |
| 127 | + "id": "UX0aSKBCYmj2" |
| 128 | + }, |
| 129 | + "outputs": [], |
| 130 | + "source": [ |
| 131 | + "import os\n", |
| 132 | + "# Keep using keras-2 (tf-keras) rather than keras-3 (keras).\n", |
| 133 | + "os.environ['TF_USE_LEGACY_KERAS'] = '1'" |
| 134 | + ] |
| 135 | + }, |
123 | 136 | {
|
124 | 137 | "cell_type": "code",
|
125 | 138 | "execution_count": null,
|
|
222 | 235 | "\n",
|
223 | 236 | "In Reinforcement Learning (RL), an environment represents the task or problem to be solved. Standard environments can be created in TF-Agents using `tf_agents.environments` suites. TF-Agents has suites for loading environments from sources such as the OpenAI Gym, Atari, and DM Control.\n",
|
224 | 237 | "\n",
|
225 |
| - "Load the CartPole environment from the OpenAI Gym suite. " |
| 238 | + "Load the CartPole environment from the OpenAI Gym suite." |
226 | 239 | ]
|
227 | 240 | },
|
228 | 241 | {
|
|
323 | 336 | "source": [
|
324 | 337 | "In the Cartpole environment:\n",
|
325 | 338 | "\n",
|
326 |
| - "- `observation` is an array of 4 floats: \n", |
| 339 | + "- `observation` is an array of 4 floats:\n", |
327 | 340 | " - the position and velocity of the cart\n",
|
328 |
| - " - the angular position and velocity of the pole \n", |
| 341 | + " - the angular position and velocity of the pole\n", |
329 | 342 | "- `reward` is a scalar float value\n",
|
330 | 343 | "- `action` is a scalar integer with only two possible values:\n",
|
331 | 344 | " - `0` — \"move left\"\n",
|
|
357 | 370 | "id": "4JSc9GviWUBK"
|
358 | 371 | },
|
359 | 372 | "source": [
|
360 |
| - "Usually two environments are instantiated: one for training and one for evaluation. " |
| 373 | + "Usually two environments are instantiated: one for training and one for evaluation." |
361 | 374 | ]
|
362 | 375 | },
|
363 | 376 | {
|
|
500 | 513 | "- The desired outcome is keeping the pole balanced upright over the cart.\n",
|
501 | 514 | "- The policy returns an action (left or right) for each `time_step` observation.\n",
|
502 | 515 | "\n",
|
503 |
| - "Agents contain two policies: \n", |
| 516 | + "Agents contain two policies:\n", |
504 | 517 | "\n",
|
505 | 518 | "- `agent.policy` — The main policy that is used for evaluation and deployment.\n",
|
506 | 519 | "- `agent.collect_policy` — A second policy that is used for data collection.\n"
|
|
834 | 847 | "source": [
|
835 | 848 | "# For the curious:\n",
|
836 | 849 | "# Uncomment to see what the dataset iterator is feeding to the agent.\n",
|
837 |
| - "# Compare this representation of replay data \n", |
| 850 | + "# Compare this representation of replay data\n", |
838 | 851 | "# to the collection of individual trajectories shown earlier.\n",
|
839 | 852 | "\n",
|
840 | 853 | "# iterator.next()"
|
|
967 | 980 | "id": "9pGfGxSH32gn"
|
968 | 981 | },
|
969 | 982 | "source": [
|
970 |
| - "Charts are nice. But more exciting is seeing an agent actually performing a task in an environment. \n", |
| 983 | + "Charts are nice. But more exciting is seeing an agent actually performing a task in an environment.\n", |
971 | 984 | "\n",
|
972 | 985 | "First, create a function to embed videos in the notebook."
|
973 | 986 | ]
|
|
1048 | 1061 | ],
|
1049 | 1062 | "metadata": {
|
1050 | 1063 | "colab": {
|
1051 |
| - "collapsed_sections": [], |
1052 | 1064 | "name": "DQN Tutorial.ipynb",
|
1053 | 1065 | "private_outputs": true,
|
1054 | 1066 | "provenance": [],
|
|
0 commit comments