From 93f0fd8698181164f2dfc67ed490aec4c02be867 Mon Sep 17 00:00:00 2001
From: Frank Dellaert <dellaert@gmail.com>
Date: Mon, 23 Dec 2024 11:20:49 -0500
Subject: [PATCH 1/2] Fix indexing

---
 S30_vacuum_intro.ipynb    |  9 +++++++--
 S37_vacuum_summary.ipynb  |  8 ++++----
 S41_logistics_state.ipynb |  4 +++-
 S51_diffdrive_state.ipynb |  8 ++++----
 S62_driving_actions.ipynb |  4 +++-
 S63_driving_sensing.ipynb | 12 ++++++------
 S70_drone_intro.ipynb     |  4 ++--
 S75_drone_planning.ipynb  | 32 ++++++++++++++++++++++++--------
 8 files changed, 53 insertions(+), 28 deletions(-)

diff --git a/S30_vacuum_intro.ipynb b/S30_vacuum_intro.ipynb
index a0114ab2..2e4ae1c6 100644
--- a/S30_vacuum_intro.ipynb
+++ b/S30_vacuum_intro.ipynb
@@ -37,8 +37,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "```{index} factor graph",
-    "```",
+    "```{index} factor graph\n",
+    "```\n",
     "Hence, in this chapter, we will learn about probabilistic outcomes of actions.\n",
     "For our vacuum cleaning robot, states correspond to rooms in the house, and trajectories correspond to the robot moving from room to room.\n",
     "We will model uncertain actions with conditional probability distributions, just like we did with sensor measurements in the previous chapter.\n",
@@ -63,6 +63,11 @@
     "\n",
     "Finally, we will introduce the notion of reinforcement learning, where we will estimate the parameters of an MDP using data that is obtained during the robot's normal operation."
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
   }
  ],
  "metadata": {
diff --git a/S37_vacuum_summary.ipynb b/S37_vacuum_summary.ipynb
index 1e260945..e0fb6586 100644
--- a/S37_vacuum_summary.ipynb
+++ b/S37_vacuum_summary.ipynb
@@ -45,8 +45,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "```{index} factor graphs, factors",
-    "```",
+    "```{index} factor graphs, factors\n",
+    "```\n",
     "## Reasoning\n",
     "\n",
     "Bayes nets are great for *modeling*, and in Section 3.4 we introduced Hidden Markov Models that allow us to reason about a sequence of hidden states, observed via noisy measurements. Hidden Markov models have been around for a long time and transformed areas such as speech recognition. They are exactly what we need for robot localization over time, as well. Beyond the simple vacuum cleaning robot example, they can be generalized to nearly any robot/environment combo that we can model using discrete states transitioning over time. In our example we use just a single discrete sensor, but the HMM is able to accommodate multiple sensors, even continuous ones. \n",
@@ -68,8 +68,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "```{index} elimination algorithm",
-    "```",
+    "```{index} elimination algorithm\n",
+    "```\n",
     "## Background and History\n",
     "\n",
     "Markov chains date back to -you guessed it- [Andrey Markov](https://en.wikipedia.org/wiki/Andrey_Markov) who used them to study, among other things, the statistics of language. In fact, attesting to the importance and generality of the concept, any finite-context large language model can be viewed as a Markov chain - admittedly with a rather vast state space.\n",
diff --git a/S41_logistics_state.ipynb b/S41_logistics_state.ipynb
index 25ca50b2..2b5ad358 100644
--- a/S41_logistics_state.ipynb
+++ b/S41_logistics_state.ipynb
@@ -155,7 +155,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "```{index} covariance matrix, multivariate Gaussian density```The energy analogy can be extended to the multivariate case.\n",
+    "```{index} covariance matrix, multivariate Gaussian density\n",
+    "```\n",
+    "The energy analogy can be extended to the multivariate case.\n",
     "In the 1D case, the mean and variance are scalars.\n",
     "For the $n$-dimensional case when $x\\in\\mathbb{R}^n$, the mean is a vector,  $\\mu\\in\\mathbb{R}^n$,\n",
     "and the concept of variance is extended to define a \n",
diff --git a/S51_diffdrive_state.ipynb b/S51_diffdrive_state.ipynb
index 08165afd..b0e0b8e3 100644
--- a/S51_diffdrive_state.ipynb
+++ b/S51_diffdrive_state.ipynb
@@ -110,8 +110,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "```{index} configuration, configuration space",
-    "```",
+    "```{index} configuration, configuration space\n",
+    "```\n",
     "In many robotics applications, if we are interested only in geometric aspects of the problem (e.g., if we are not concerned with dynamics, or with forces that are required to effect motion), we use the term *configuration space* instead of the term *state space*. \n",
     "A **configuration**, denoted by $q$, is a complete specificiation of the location of every point on a robotic system (assuming that a model of the robot\n",
     "is available).  The **configuration space**, denoted by ${\\cal Q}$, is the set of all configurations.\n",
@@ -128,8 +128,8 @@
     "As an example, consider the problem of determining the x-y position of the wheel centers for our DDR.\n",
     "If the wheelbase (i.e., the distance between the two wheel centers) is denoted by $L$,\n",
     "and the robot is in configuration $q=(x,y.\\theta)$,\n",
-    "then\n",
-    "the x-y coordinates of the left and right wheel centers are given by\n",
+    "then the x-y coordinates of the left and right wheel centers are given by\n",
+    "\n",
     "\\begin{equation}\n",
     "\\left[ \\begin{array}{c} x_{\\mathrm{left}} \\\\ y_{\\mathrm{left}} \\end{array}\\right]\n",
     "=\n",
diff --git a/S62_driving_actions.ipynb b/S62_driving_actions.ipynb
index 2525c817..5f37d537 100644
--- a/S62_driving_actions.ipynb
+++ b/S62_driving_actions.ipynb
@@ -103,7 +103,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "```{index} Ackermann steering```This kind of steering for the front wheels is called **Ackermann steering**, as illustrated in the figure above.\n",
+    "```{index} Ackermann steering\n",
+    "```\n",
+    "This kind of steering for the front wheels is called **Ackermann steering**, as illustrated in the figure above.\n",
     "The physical mechanism required to implement Ackermann steering is slightly complex,\n",
     "but happily we can model the system by using a single *virtual wheel* placed\n",
     "at the midpoint between the two front wheels, rolling in a direction perpendicular to the line from\n",
diff --git a/S63_driving_sensing.ipynb b/S63_driving_sensing.ipynb
index 20097adc..9f740ab4 100644
--- a/S63_driving_sensing.ipynb
+++ b/S63_driving_sensing.ipynb
@@ -86,8 +86,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "```{index} Time of Flight, ToF, direct ToF, Indirect ToF, 2D LIDAR, 3D LIDAR",
-    "```",
+    "```{index} Time of Flight, ToF, direct ToF, Indirect ToF, 2D LIDAR, 3D LIDAR\n",
+    "```\n",
     "## LIDAR\n",
     "\n",
     "LIDAR (LIght raDAR) is a technology that measures distance to an object by using laser light and the **Time of Flight** or **ToF** principle. There are several variants in use, and the simplest to explain is the **direct ToF** sensor, which sends out a short pulse and measures the elapsed time $\\Delta t$ for the light to bounce off an object and return to a detector collocated with the laser pulse emitter. If the object is situated at a distance $d$ from the emitter-detector pair, we have\n",
@@ -114,8 +114,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "```{index} ray direction",
-    "```",
+    "```{index} ray direction\n",
+    "```\n",
     "## Ray Intersection\n",
     "\n",
     "> Intersecting rays is as easy as computing a dot product.\n",
@@ -512,8 +512,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "```{index} point cloud map",
-    "```",
+    "```{index} point cloud map\n",
+    "```\n",
     "## Creating 3D Maps\n",
     "\n",
     "> Point clouds can be used to represent the 3D world.\n",
diff --git a/S70_drone_intro.ipynb b/S70_drone_intro.ipynb
index 57fb9682..63a794cb 100644
--- a/S70_drone_intro.ipynb
+++ b/S70_drone_intro.ipynb
@@ -21,8 +21,8 @@
     "id": "nAvx4-UCNzt2"
    },
    "source": [
-    "```{index} Unmanned aerial vehicles, micro aerial vehicles, 3D reconstruction, trajectory optimization",
-    "```",
+    "```{index} Unmanned aerial vehicles, micro aerial vehicles, 3D reconstruction, trajectory optimization\n",
+    "```\n",
     "**Unmanned aerial vehicles** (UAVs) take autonomy into the next dimension: into three dimensions, to be exact. Whereas autonomous vehicles are bound to earth, UAVs take to the air. Hence, their perception and planning problems are fundamentally harder in a geometric sense. On the other hand, our airspace is currently much sparser than our roads are, and thus dynamic obstacle avoidance is less of an issue.\n",
     "\n",
     "In this chapter we will concentrate on a very popular class of UAVs: quadrotors. These are craft equipped with four actuated rotors that allow for very simple control algorithms, yet can achieve very agile flight. Because quadrotors are cheap to manufacture and there is a large market for camera drones, most quadrotor vehicles have a rather small form factor and are designed to be easily portable. Sometimes this segment of the UAV market is designated as **micro aerial vehicles** or MAVs, a term we will use throughout this chapter.\n",
diff --git a/S75_drone_planning.ipynb b/S75_drone_planning.ipynb
index 5bcfafec..0d48a3ca 100644
--- a/S75_drone_planning.ipynb
+++ b/S75_drone_planning.ipynb
@@ -89,7 +89,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "```{index} trajectory optimization```In the previous section we saw how use factor graphs for visual SLAM and structure from motion. These perception algorithms are typically run after the robot has gathered some visual information, and provide information about what happened in the past. But how can we plan for the future? \n",
+    "```{index} trajectory optimization\n",
+    "```\n",
+    "In the previous section we saw how use factor graphs for visual SLAM and structure from motion. These perception algorithms are typically run after the robot has gathered some visual information, and provide information about what happened in the past. But how can we plan for the future? \n",
     "\n",
     "We already saw that RRTs are a useful tool for planning in a continuous, potentially high dimensional state space. However, RRTs are not concerned with optimality. They aim for feasible paths, where sometimes feasibility means \"collision-free\" and sometimes it includes honoring the system dynamics. But if we want to achieve optimal trajectories in terms of time to goal, best use of energy, or minimum distance, we need to turn to other methods.\n",
     "\n",
@@ -118,7 +120,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "```{index} path, trajectory```## Optimizing for Position\n",
+    "```{index} path, trajectory\n",
+    "```\n",
+    "## Optimizing for Position\n",
     "\n",
     "> Position is all we need for the first step.\n",
     "\n",
@@ -178,7 +182,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "```{index} occupancy map, cost map```## Occupancy and Cost Maps\n",
+    "```{index} occupancy map, cost map\n",
+    "```\n",
+    "## Occupancy and Cost Maps\n",
     "\n",
     "> We can use maps to encode costs to minimize.\n",
     "\n",
@@ -823,7 +829,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "```{index} vectored thrust```## A Virtual Vectored Thrust\n",
+    "```{index} vectored thrust\n",
+    "```\n",
+    "## A Virtual Vectored Thrust\n",
     "\n",
     "> What we want, in theory...\n",
     "\n",
@@ -920,7 +928,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "```{index} feedback control```## Combining Open Loop and Feedback Control\n",
+    "```{index} feedback control\n",
+    "```\n",
+    "## Combining Open Loop and Feedback Control\n",
     "\n",
     "> What we want, in practice!\n",
     "\n",
@@ -980,7 +990,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "```{index} controller gain```We can set up a small simulation to see how this controller behaves in practice, and in particular how the controller behaves for different values of $K_x$ and $K_v$. A factor like this is called a **controller gain**, and choosing the gains optimally is a standard problem in control theory.\n",
+    "```{index} controller gain\n",
+    "```\n",
+    "We can set up a small simulation to see how this controller behaves in practice, and in particular how the controller behaves for different values of $K_x$ and $K_v$. A factor like this is called a **controller gain**, and choosing the gains optimally is a standard problem in control theory.\n",
     "\n",
     "Below we use the same simulation strategy as in Section 7.2, and in particular use the `Drone` class that was defined there. In the simulation below we do not worry about the rotation yet:"
    ]
@@ -1090,7 +1102,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "```{index} proportional```The mathematical equivalent of FPV for a control *algorithm* is to rotate everything into the body frame. Taking the desired thrust vector $T^n$ and multiplying it with the transpose of the attitude $R^n_b$ (which is the inverse rotation, recall Sections 6.1 and 7.1) yields the desired thrust vector $T^b$ in the body frame:\n",
+    "```{index} proportional\n",
+    "```\n",
+    "The mathematical equivalent of FPV for a control *algorithm* is to rotate everything into the body frame. Taking the desired thrust vector $T^n$ and multiplying it with the transpose of the attitude $R^n_b$ (which is the inverse rotation, recall Sections 6.1 and 7.1) yields the desired thrust vector $T^b$ in the body frame:\n",
     "\\begin{equation}\n",
     "T^b = (R^n_b)^T T^n.\n",
     "\\end{equation}\n",
@@ -1270,7 +1284,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "```{index} cascaded controller```Note that in the code we now have an outer and an inner loop. The outer loop is for the \"slow\" translational dynamics, whereas the inner loop simulates the \"fast\" attitude dynamics. Such a **cascaded controller** is a typical design choice for drone applications."
+    "```{index} cascaded controller\n",
+    "```\n",
+    "Note that in the code we now have an outer and an inner loop. The outer loop is for the \"slow\" translational dynamics, whereas the inner loop simulates the \"fast\" attitude dynamics. Such a **cascaded controller** is a typical design choice for drone applications."
    ]
   },
   {

From 62e0b559f63aeb651aa426d0466d74e7be8fe361 Mon Sep 17 00:00:00 2001
From: Frank Dellaert <dellaert@gmail.com>
Date: Mon, 23 Dec 2024 11:20:57 -0500
Subject: [PATCH 2/2] Make some edits

---
 S66_driving_DRL.ipynb | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/S66_driving_DRL.ipynb b/S66_driving_DRL.ipynb
index 904d58e9..4a0d7f19 100644
--- a/S66_driving_DRL.ipynb
+++ b/S66_driving_DRL.ipynb
@@ -85,9 +85,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "```{index} lateral control, longitudinal control\n",
+    "```{index} lateral control, longitudinal control, lane switching\n",
     "```\n",
-    "A simple example in the autonomous driving domain is *lane switching*. Suppose we are driving along at 3-lane highway, and we can see some ways ahead, and some ways behind us. We are driving at a speed that is comfortable to us, but other cars have different ideas about the optimal speed to drive at. Hence, sometimes we would like to change lanes, and we could learn a policy to do this for us. As discussed in Section 6.5, this is **lateral control**. A more sophisticated example would also allow us to adapt our speed to the traffic pattern, but by relying on a smart cruise control system we could safely ignore the **longitudinal control** problem."
+    "A simple example in the autonomous driving domain is *lane switching*. Suppose we are driving along at 3-lane highway, and we can see some ways ahead, and -using the rear-view mirror- some ways behind us. We are driving at a speed that is comfortable to us, but other cars have different ideas about their optimal driving speed. Hence, sometimes we would like to change lanes, and we could learn a policy to do this for us. As discussed in Section 6.5, this is **lateral control**. A more sophisticated example would also allow us to adapt our speed to the traffic pattern, but by relying on a smart cruise control system we could safely ignore the **longitudinal control** problem."
    ]
   },
   {
@@ -121,11 +121,15 @@
     "\\begin{equation}\n",
     "\\pi^*(x) = \\arg \\max_a Q^*(x,a)\n",
     "\\end{equation}\n",
-    "where $Q^*(x,a)$ denote the Q-values for the *optimal* policy. In Q-learning, we start with some random Q-values and then iteratively improve the estimate for the optimal Q-values by alpha-blending between old and new estimates:\n",
+    "where $Q^*(x,a)$ denote the Q-values for the *optimal* policy. In Q-learning, we start with some random Q-values and then iteratively improve an estimate $\\hat{Q}(x,a)$ for the optimal Q-values by alpha-blending between old and new estimates:\n",
     "\\begin{equation}\n",
-    "\\hat{Q}(x,a) \\leftarrow (1-\\alpha) \\hat{Q}(x,a) + \\alpha~\\text{target}(x,a,x')\n",
+    "\\hat{Q}(x,a) \\leftarrow (1-\\alpha) \\hat{Q}(x,a) + \\alpha~\\text{target}(x,a,x').\n",
     "\\end{equation}\n",
-    "where $\\text{target}(x,a,x') \\doteq R(x,a,x') + \\gamma \\max_{a'} \\hat{Q}(x',a')$ is the \"target\" value that we think is an improvement on the previous value $\\hat{Q}(x,a)$. Indeed: the target $\\text{target}(x,a,x')$ uses the current estimate of the Q-values for future states, but improves on this by using the *known* reward $R(x,a,x')$ for the current action in the current state."
+    "Above, the \"target value\"\n",
+    "\\begin{equation}\n",
+    "\\text{target}(x,a,x') \\doteq R(x,a,x') + \\gamma \\max_{a'} \\hat{Q}(x',a')\n",
+    "\\end{equation}\n",
+    "is a value that we think is an improvement on the previous value $\\hat{Q}(x,a)$. Indeed: $\\text{target}(x,a,x')$ uses the *current* estimate of the Q-values for future states, but improves on this by using the *known* rewards $R(x,a,x')$ for the current action $a$ in the current state $x$."
    ]
   },
   {
@@ -134,11 +138,13 @@
    "source": [
     "```{index} execution phase, experience replay\n",
     "```\n",
-    "In the **deep Q-network** or DQN method we use a *supervised learning* approach to Q-learning, by training a neural network, parameterized by $\\theta$, to approximate the optimal Q-values:\n",
+    "In the **deep Q-network** or DQN method we use a *supervised learning* approach to Q-learning. We train a neural network, parameterized by $\\theta$, to approximate the optimal Q-values:\n",
     "\\begin{equation}\n",
-    "Q^*(x,a) \\approx Q(x,a; \\theta)\n",
+    "Q^*(x,a) \\approx \\hat{Q}(x,a; \\theta)\n",
     "\\end{equation}\n",
-    "It might be worthwhile to re-visit Section 5.6, where we introduced neural networks and how to train them using stochastic gradient descent (SGD). In the context of RL, the DQN method uses two additional ideas that are crucial in making the training converge to something sensible in difficult problems. The first is splitting the training into *execution* and *experience replay* phases:\n",
+    "It might be worthwhile at this point to re-visit Section 5.6, where we introduced neural networks and how to train them using stochastic gradient descent (SGD).\n",
+    "\n",
+    "In the context of RL, the DQN method uses two additional ideas that are crucial in making the training converge to something sensible in difficult problems. The first is splitting the training into *execution* and *experience replay* phases:\n",
     "\n",
     "- during the **execution phase**, the policy is executed (possibly with some degree of randomness) and the experiences $(x,a,r,x')$, with $r$ the reward, are stored in a dataset $D$;\n",
     "- during **experience replay**, we *randomly sample* from these experiences to create mini-batches of data, which are in turn used to perform SGD on the parameters $\\theta$.\n",
@@ -154,7 +160,6 @@
     "\\end{equation}\n",
     "\n",
     "With this basic scheme, a team from DeepMind was able to achieve human or super-human performance on about 50 Atari 2600 games in 2015 {cite:p}`Mnih15nature_dqn`.\n",
-    "\n",
     "DQN is a so-called **off-policy** method, in that each execution phase uses the best policy we computed so far, but we can still replay earlier experiences gathered with \"lesser\" policies. Nothing in the experience replay phase references the policy: every experience leads to a valid Q-value backup and a valid supervised learning signal."
    ]
   },