Skip to content

Commit

Permalink
Merge pull request #97 from gtbook/fix_indexing
Browse files Browse the repository at this point in the history
Fix indexing
  • Loading branch information
dellaert authored Dec 23, 2024
2 parents 37105bd + 62e0b55 commit 375d850
Show file tree
Hide file tree
Showing 9 changed files with 67 additions and 37 deletions.
9 changes: 7 additions & 2 deletions S30_vacuum_intro.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"```{index} factor graph",
"```",
"```{index} factor graph\n",
"```\n",
"Hence, in this chapter, we will learn about probabilistic outcomes of actions.\n",
"For our vacuum cleaning robot, states correspond to rooms in the house, and trajectories correspond to the robot moving from room to room.\n",
"We will model uncertain actions with conditional probability distributions, just like we did with sensor measurements in the previous chapter.\n",
Expand All @@ -63,6 +63,11 @@
"\n",
"Finally, we will introduce the notion of reinforcement learning, where we will estimate the parameters of an MDP using data that is obtained during the robot's normal operation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
Expand Down
8 changes: 4 additions & 4 deletions S37_vacuum_summary.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"```{index} factor graphs, factors",
"```",
"```{index} factor graphs, factors\n",
"```\n",
"## Reasoning\n",
"\n",
"Bayes nets are great for *modeling*, and in Section 3.4 we introduced Hidden Markov Models that allow us to reason about a sequence of hidden states, observed via noisy measurements. Hidden Markov models have been around for a long time and transformed areas such as speech recognition. They are exactly what we need for robot localization over time, as well. Beyond the simple vacuum cleaning robot example, they can be generalized to nearly any robot/environment combo that we can model using discrete states transitioning over time. In our example we use just a single discrete sensor, but the HMM is able to accommodate multiple sensors, even continuous ones. \n",
Expand All @@ -68,8 +68,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"```{index} elimination algorithm",
"```",
"```{index} elimination algorithm\n",
"```\n",
"## Background and History\n",
"\n",
"Markov chains date back to -you guessed it- [Andrey Markov](https://en.wikipedia.org/wiki/Andrey_Markov) who used them to study, among other things, the statistics of language. In fact, attesting to the importance and generality of the concept, any finite-context large language model can be viewed as a Markov chain - admittedly with a rather vast state space.\n",
Expand Down
4 changes: 3 additions & 1 deletion S41_logistics_state.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"```{index} covariance matrix, multivariate Gaussian density```The energy analogy can be extended to the multivariate case.\n",
"```{index} covariance matrix, multivariate Gaussian density\n",
"```\n",
"The energy analogy can be extended to the multivariate case.\n",
"In the 1D case, the mean and variance are scalars.\n",
"For the $n$-dimensional case when $x\\in\\mathbb{R}^n$, the mean is a vector, $\\mu\\in\\mathbb{R}^n$,\n",
"and the concept of variance is extended to define a \n",
Expand Down
8 changes: 4 additions & 4 deletions S51_diffdrive_state.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -110,8 +110,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"```{index} configuration, configuration space",
"```",
"```{index} configuration, configuration space\n",
"```\n",
"In many robotics applications, if we are interested only in geometric aspects of the problem (e.g., if we are not concerned with dynamics, or with forces that are required to effect motion), we use the term *configuration space* instead of the term *state space*. \n",
"A **configuration**, denoted by $q$, is a complete specificiation of the location of every point on a robotic system (assuming that a model of the robot\n",
"is available). The **configuration space**, denoted by ${\\cal Q}$, is the set of all configurations.\n",
Expand All @@ -128,8 +128,8 @@
"As an example, consider the problem of determining the x-y position of the wheel centers for our DDR.\n",
"If the wheelbase (i.e., the distance between the two wheel centers) is denoted by $L$,\n",
"and the robot is in configuration $q=(x,y.\\theta)$,\n",
"then\n",
"the x-y coordinates of the left and right wheel centers are given by\n",
"then the x-y coordinates of the left and right wheel centers are given by\n",
"\n",
"\\begin{equation}\n",
"\\left[ \\begin{array}{c} x_{\\mathrm{left}} \\\\ y_{\\mathrm{left}} \\end{array}\\right]\n",
"=\n",
Expand Down
4 changes: 3 additions & 1 deletion S62_driving_actions.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"```{index} Ackermann steering```This kind of steering for the front wheels is called **Ackermann steering**, as illustrated in the figure above.\n",
"```{index} Ackermann steering\n",
"```\n",
"This kind of steering for the front wheels is called **Ackermann steering**, as illustrated in the figure above.\n",
"The physical mechanism required to implement Ackermann steering is slightly complex,\n",
"but happily we can model the system by using a single *virtual wheel* placed\n",
"at the midpoint between the two front wheels, rolling in a direction perpendicular to the line from\n",
Expand Down
12 changes: 6 additions & 6 deletions S63_driving_sensing.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"```{index} Time of Flight, ToF, direct ToF, Indirect ToF, 2D LIDAR, 3D LIDAR",
"```",
"```{index} Time of Flight, ToF, direct ToF, Indirect ToF, 2D LIDAR, 3D LIDAR\n",
"```\n",
"## LIDAR\n",
"\n",
"LIDAR (LIght raDAR) is a technology that measures distance to an object by using laser light and the **Time of Flight** or **ToF** principle. There are several variants in use, and the simplest to explain is the **direct ToF** sensor, which sends out a short pulse and measures the elapsed time $\\Delta t$ for the light to bounce off an object and return to a detector collocated with the laser pulse emitter. If the object is situated at a distance $d$ from the emitter-detector pair, we have\n",
Expand All @@ -114,8 +114,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"```{index} ray direction",
"```",
"```{index} ray direction\n",
"```\n",
"## Ray Intersection\n",
"\n",
"> Intersecting rays is as easy as computing a dot product.\n",
Expand Down Expand Up @@ -512,8 +512,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"```{index} point cloud map",
"```",
"```{index} point cloud map\n",
"```\n",
"## Creating 3D Maps\n",
"\n",
"> Point clouds can be used to represent the 3D world.\n",
Expand Down
23 changes: 14 additions & 9 deletions S66_driving_DRL.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -85,9 +85,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"```{index} lateral control, longitudinal control\n",
"```{index} lateral control, longitudinal control, lane switching\n",
"```\n",
"A simple example in the autonomous driving domain is *lane switching*. Suppose we are driving along at 3-lane highway, and we can see some ways ahead, and some ways behind us. We are driving at a speed that is comfortable to us, but other cars have different ideas about the optimal speed to drive at. Hence, sometimes we would like to change lanes, and we could learn a policy to do this for us. As discussed in Section 6.5, this is **lateral control**. A more sophisticated example would also allow us to adapt our speed to the traffic pattern, but by relying on a smart cruise control system we could safely ignore the **longitudinal control** problem."
"A simple example in the autonomous driving domain is *lane switching*. Suppose we are driving along at 3-lane highway, and we can see some ways ahead, and -using the rear-view mirror- some ways behind us. We are driving at a speed that is comfortable to us, but other cars have different ideas about their optimal driving speed. Hence, sometimes we would like to change lanes, and we could learn a policy to do this for us. As discussed in Section 6.5, this is **lateral control**. A more sophisticated example would also allow us to adapt our speed to the traffic pattern, but by relying on a smart cruise control system we could safely ignore the **longitudinal control** problem."
]
},
{
Expand Down Expand Up @@ -121,11 +121,15 @@
"\\begin{equation}\n",
"\\pi^*(x) = \\arg \\max_a Q^*(x,a)\n",
"\\end{equation}\n",
"where $Q^*(x,a)$ denote the Q-values for the *optimal* policy. In Q-learning, we start with some random Q-values and then iteratively improve the estimate for the optimal Q-values by alpha-blending between old and new estimates:\n",
"where $Q^*(x,a)$ denote the Q-values for the *optimal* policy. In Q-learning, we start with some random Q-values and then iteratively improve an estimate $\\hat{Q}(x,a)$ for the optimal Q-values by alpha-blending between old and new estimates:\n",
"\\begin{equation}\n",
"\\hat{Q}(x,a) \\leftarrow (1-\\alpha) \\hat{Q}(x,a) + \\alpha~\\text{target}(x,a,x')\n",
"\\hat{Q}(x,a) \\leftarrow (1-\\alpha) \\hat{Q}(x,a) + \\alpha~\\text{target}(x,a,x').\n",
"\\end{equation}\n",
"where $\\text{target}(x,a,x') \\doteq R(x,a,x') + \\gamma \\max_{a'} \\hat{Q}(x',a')$ is the \"target\" value that we think is an improvement on the previous value $\\hat{Q}(x,a)$. Indeed: the target $\\text{target}(x,a,x')$ uses the current estimate of the Q-values for future states, but improves on this by using the *known* reward $R(x,a,x')$ for the current action in the current state."
"Above, the \"target value\"\n",
"\\begin{equation}\n",
"\\text{target}(x,a,x') \\doteq R(x,a,x') + \\gamma \\max_{a'} \\hat{Q}(x',a')\n",
"\\end{equation}\n",
"is a value that we think is an improvement on the previous value $\\hat{Q}(x,a)$. Indeed: $\\text{target}(x,a,x')$ uses the *current* estimate of the Q-values for future states, but improves on this by using the *known* rewards $R(x,a,x')$ for the current action $a$ in the current state $x$."
]
},
{
Expand All @@ -134,11 +138,13 @@
"source": [
"```{index} execution phase, experience replay\n",
"```\n",
"In the **deep Q-network** or DQN method we use a *supervised learning* approach to Q-learning, by training a neural network, parameterized by $\\theta$, to approximate the optimal Q-values:\n",
"In the **deep Q-network** or DQN method we use a *supervised learning* approach to Q-learning. We train a neural network, parameterized by $\\theta$, to approximate the optimal Q-values:\n",
"\\begin{equation}\n",
"Q^*(x,a) \\approx Q(x,a; \\theta)\n",
"Q^*(x,a) \\approx \\hat{Q}(x,a; \\theta)\n",
"\\end{equation}\n",
"It might be worthwhile to re-visit Section 5.6, where we introduced neural networks and how to train them using stochastic gradient descent (SGD). In the context of RL, the DQN method uses two additional ideas that are crucial in making the training converge to something sensible in difficult problems. The first is splitting the training into *execution* and *experience replay* phases:\n",
"It might be worthwhile at this point to re-visit Section 5.6, where we introduced neural networks and how to train them using stochastic gradient descent (SGD).\n",
"\n",
"In the context of RL, the DQN method uses two additional ideas that are crucial in making the training converge to something sensible in difficult problems. The first is splitting the training into *execution* and *experience replay* phases:\n",
"\n",
"- during the **execution phase**, the policy is executed (possibly with some degree of randomness) and the experiences $(x,a,r,x')$, with $r$ the reward, are stored in a dataset $D$;\n",
"- during **experience replay**, we *randomly sample* from these experiences to create mini-batches of data, which are in turn used to perform SGD on the parameters $\\theta$.\n",
Expand All @@ -154,7 +160,6 @@
"\\end{equation}\n",
"\n",
"With this basic scheme, a team from DeepMind was able to achieve human or super-human performance on about 50 Atari 2600 games in 2015 {cite:p}`Mnih15nature_dqn`.\n",
"\n",
"DQN is a so-called **off-policy** method, in that each execution phase uses the best policy we computed so far, but we can still replay earlier experiences gathered with \"lesser\" policies. Nothing in the experience replay phase references the policy: every experience leads to a valid Q-value backup and a valid supervised learning signal."
]
},
Expand Down
4 changes: 2 additions & 2 deletions S70_drone_intro.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@
"id": "nAvx4-UCNzt2"
},
"source": [
"```{index} Unmanned aerial vehicles, micro aerial vehicles, 3D reconstruction, trajectory optimization",
"```",
"```{index} Unmanned aerial vehicles, micro aerial vehicles, 3D reconstruction, trajectory optimization\n",
"```\n",
"**Unmanned aerial vehicles** (UAVs) take autonomy into the next dimension: into three dimensions, to be exact. Whereas autonomous vehicles are bound to earth, UAVs take to the air. Hence, their perception and planning problems are fundamentally harder in a geometric sense. On the other hand, our airspace is currently much sparser than our roads are, and thus dynamic obstacle avoidance is less of an issue.\n",
"\n",
"In this chapter we will concentrate on a very popular class of UAVs: quadrotors. These are craft equipped with four actuated rotors that allow for very simple control algorithms, yet can achieve very agile flight. Because quadrotors are cheap to manufacture and there is a large market for camera drones, most quadrotor vehicles have a rather small form factor and are designed to be easily portable. Sometimes this segment of the UAV market is designated as **micro aerial vehicles** or MAVs, a term we will use throughout this chapter.\n",
Expand Down
32 changes: 24 additions & 8 deletions S75_drone_planning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"```{index} trajectory optimization```In the previous section we saw how use factor graphs for visual SLAM and structure from motion. These perception algorithms are typically run after the robot has gathered some visual information, and provide information about what happened in the past. But how can we plan for the future? \n",
"```{index} trajectory optimization\n",
"```\n",
"In the previous section we saw how use factor graphs for visual SLAM and structure from motion. These perception algorithms are typically run after the robot has gathered some visual information, and provide information about what happened in the past. But how can we plan for the future? \n",
"\n",
"We already saw that RRTs are a useful tool for planning in a continuous, potentially high dimensional state space. However, RRTs are not concerned with optimality. They aim for feasible paths, where sometimes feasibility means \"collision-free\" and sometimes it includes honoring the system dynamics. But if we want to achieve optimal trajectories in terms of time to goal, best use of energy, or minimum distance, we need to turn to other methods.\n",
"\n",
Expand Down Expand Up @@ -118,7 +120,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"```{index} path, trajectory```## Optimizing for Position\n",
"```{index} path, trajectory\n",
"```\n",
"## Optimizing for Position\n",
"\n",
"> Position is all we need for the first step.\n",
"\n",
Expand Down Expand Up @@ -178,7 +182,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"```{index} occupancy map, cost map```## Occupancy and Cost Maps\n",
"```{index} occupancy map, cost map\n",
"```\n",
"## Occupancy and Cost Maps\n",
"\n",
"> We can use maps to encode costs to minimize.\n",
"\n",
Expand Down Expand Up @@ -823,7 +829,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"```{index} vectored thrust```## A Virtual Vectored Thrust\n",
"```{index} vectored thrust\n",
"```\n",
"## A Virtual Vectored Thrust\n",
"\n",
"> What we want, in theory...\n",
"\n",
Expand Down Expand Up @@ -920,7 +928,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"```{index} feedback control```## Combining Open Loop and Feedback Control\n",
"```{index} feedback control\n",
"```\n",
"## Combining Open Loop and Feedback Control\n",
"\n",
"> What we want, in practice!\n",
"\n",
Expand Down Expand Up @@ -980,7 +990,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"```{index} controller gain```We can set up a small simulation to see how this controller behaves in practice, and in particular how the controller behaves for different values of $K_x$ and $K_v$. A factor like this is called a **controller gain**, and choosing the gains optimally is a standard problem in control theory.\n",
"```{index} controller gain\n",
"```\n",
"We can set up a small simulation to see how this controller behaves in practice, and in particular how the controller behaves for different values of $K_x$ and $K_v$. A factor like this is called a **controller gain**, and choosing the gains optimally is a standard problem in control theory.\n",
"\n",
"Below we use the same simulation strategy as in Section 7.2, and in particular use the `Drone` class that was defined there. In the simulation below we do not worry about the rotation yet:"
]
Expand Down Expand Up @@ -1090,7 +1102,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"```{index} proportional```The mathematical equivalent of FPV for a control *algorithm* is to rotate everything into the body frame. Taking the desired thrust vector $T^n$ and multiplying it with the transpose of the attitude $R^n_b$ (which is the inverse rotation, recall Sections 6.1 and 7.1) yields the desired thrust vector $T^b$ in the body frame:\n",
"```{index} proportional\n",
"```\n",
"The mathematical equivalent of FPV for a control *algorithm* is to rotate everything into the body frame. Taking the desired thrust vector $T^n$ and multiplying it with the transpose of the attitude $R^n_b$ (which is the inverse rotation, recall Sections 6.1 and 7.1) yields the desired thrust vector $T^b$ in the body frame:\n",
"\\begin{equation}\n",
"T^b = (R^n_b)^T T^n.\n",
"\\end{equation}\n",
Expand Down Expand Up @@ -1270,7 +1284,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"```{index} cascaded controller```Note that in the code we now have an outer and an inner loop. The outer loop is for the \"slow\" translational dynamics, whereas the inner loop simulates the \"fast\" attitude dynamics. Such a **cascaded controller** is a typical design choice for drone applications."
"```{index} cascaded controller\n",
"```\n",
"Note that in the code we now have an outer and an inner loop. The outer loop is for the \"slow\" translational dynamics, whereas the inner loop simulates the \"fast\" attitude dynamics. Such a **cascaded controller** is a typical design choice for drone applications."
]
},
{
Expand Down

0 comments on commit 375d850

Please sign in to comment.