diff --git a/slides/03_time-series-analysis.qmd b/slides/03_time-series-analysis.qmd
index 0e0ba42..282f826 100644
--- a/slides/03_time-series-analysis.qmd
+++ b/slides/03_time-series-analysis.qmd
@@ -9,9 +9,9 @@ title: "3. Time-Series Analysis"
- Mention time series
:::
-## Package Libraries
-
```{r}
+#| echo: false
+#| eval: false
#| message: false
#| warning: false
@@ -21,34 +21,12 @@ library("DiagrammeR")
## Stepping Forward
-```{r}
-#| echo: false
-#| eval: true
-DiagrammeR::mermaid("
-graph LR
-
-TS[time
series]
-ARIMA[autoregressive
models]
-TIS[time
invariant
systems]
-CT[control
theory]
-NSSM[neural
state space
models]
-
-TS --> ARIMA
-
-ARIMA --> diffusion
-ARIMA --> LLMs
-
-diffusion --> SDEs
-diffusion --> NSSM
-TIS --> NSSM
-CT --> NSSM
-")
-```
+![time series knowledge](images/ch_3_time_series_flowchart.png)
## Mermaid code
```{r}
-#| echo: true
+#| echo: false
#| eval: false
DiagrammeR::mermaid("
graph LR
diff --git a/slides/04_online-learning-and-regret-minimization.qmd b/slides/04_online-learning-and-regret-minimization.qmd
index efe0149..c8a6dc1 100644
--- a/slides/04_online-learning-and-regret-minimization.qmd
+++ b/slides/04_online-learning-and-regret-minimization.qmd
@@ -6,26 +6,59 @@ title: "4. Online Learning and Regret Minimization"
# Learning objectives
::: nonincremental
-- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY
+- Introduce terminology about optimization
:::
::: notes
-- You can add notes on each slide with blocks like this!
-- Load a deck in the browser and type "s" to see these notes.
+I hope that you like math
:::
-# SLIDE SECTION
+## Online Convex Optimization
+
+![OCO](images/online_convex_optimization.png)
+
+## Regret
+
+$$\text{Regret}_{T}(A) = \text{sup}\left[\sum_{t=1}^{T}f_{t}(x_{t}^{A}) - \text{min}_{x}\sum_{t=1}^{T}f_{t}(x)\right]$$
+
+* $\vec{x}_{t}^{A}$: player actions of an algorithm in a decision set
+* $T$: number of game iterations
+
+## Applications
+
+* spam filtering
+* path finding
+* portfolio selection
+* recommendation systems
+
+## Experts and Adversaries
+
+**Theorem 1.2** Let $\epsilon\in(0,0.5)$. Suppose that the best expert makes $L$ mistakes. Then:
+
+* $\exists$ an efficient *deterministic* algorithm $< 2(1+\epsilon)L + \frac{2\log N}{\epsilon}$ mistakes
+* $\exists$ an efficient *randomized* algorithm $\leq (1+\epsilon)L + \frac{\log N}{\epsilon}$ mistakes
+
+## Weighted Majority Algorithm
+
+* predict according to *majority* of experts
+
+$$a_{t} = \begin{cases} A, & W_{t}(A) \geq W_{t}(B) \\ B, & \text{otherwise}\end{cases}$$
+
+* *update* weights
+
+$$W_{t+1}(i) = \begin{cases}W_{t}(i), & \text{if expert i was correct} \\ W_{t}(i)(1-\epsilon), & \text{if expert i was wrong}\end{cases}$$
+
+## Hedging
+
+$$W_{t+1}(i) = W_{t}(i)e^{-\epsilon \ell_{t}(i)}$$
+
+* $\epsilon$: learning rate
+* $\ell_{t}(i)$: loss by expert $i$ at iteration $t$
+
+
-## SLIDE
-- DENOTE MAJOR SECTIONS WITH `# TITLE` (eg `# Installation`)
-- ADD INDIVIDUAL SLIDES WITH `##` (eg `## rustup on Linux/macOS`)
-- KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF.
-## SLIDE
-# SLIDE SECTION
-## SLIDE
-## SLIDE
diff --git a/slides/05_reinforcement-learning.qmd b/slides/05_reinforcement-learning.qmd
index abd76fd..9ed6b49 100644
--- a/slides/05_reinforcement-learning.qmd
+++ b/slides/05_reinforcement-learning.qmd
@@ -6,26 +6,166 @@ title: "5. Reinforcement Learning"
# Learning objectives
::: nonincremental
-- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY
+- Give an overview of reinforcement learning
:::
::: notes
-- You can add notes on each slide with blocks like this!
-- Load a deck in the browser and type "s" to see these notes.
+should talk about this after chapter 6 (markov models)
:::
-# SLIDE SECTION
+## Textbook
-## SLIDE
+:::: {.columns}
-- DENOTE MAJOR SECTIONS WITH `# TITLE` (eg `# Installation`)
-- ADD INDIVIDUAL SLIDES WITH `##` (eg `## rustup on Linux/macOS`)
-- KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF.
+::: {.column width="45%"}
+![Sutton and Barto](images/Sutton_Barto.png)
+:::
+
+::: {.column width="10%"}
+
+:::
+
+::: {.column width="45%"}
+![Mutual Information](images/Mutual_Information.png)
+:::
+
+::::
+
+## Markov Decision Process
+
+![MDP](images/Markov_decision_process.png)
+
+## Objective
+
+* policy: $\pi(a|s)$
+* return: $G_{t} = \sum_{k=t+1}^{T} \gamma^{k-t-1}R_{k}$
+* *maximize expected return* over all policies
+
+$$\text{max}_{\pi} \text{E}_{\pi}[G_{t}]$$
+
+## Coupled Equations
+
+* state value function
+
+$$v_{\pi}(s) = \text{E}_{\pi}[G_{t}|S_{t} = s]$$
+
+* action value function
+
+$$q_{\pi}(s,a) = \text{E}_{\pi}[G_{t}|S_{t} = s, A_{t} = a]$$
+
+## Bellman Equations
+
+> connect all state values
+
+$$\begin{array}{rcl}
+ v_{\pi}(s^{i}) & = & \text{E}_{\pi}[G_{t}|s^{i}] \\
+ ~ & = & \sum_{\{a\}} \pi(a|s^{i}) \cdot q(s^{i},a) \\
+ ~ & = & \sum_{\{a\}} \pi(a|s^{i}) \cdot \text{E}_{\pi}[G_{t}|s^{i}, a] \\
+\end{array}$$
+
+## Bellman Optimality Equations
+
+For any optimal $\pi_{*}$, $\forall s \in S$, $\forall a \in A$
+
+$$\begin{array}{rcl}
+ v_{*}(s) & = & \text{max}_{a} q_{*}(s,a) \\
+ q_{*}(s,a) & = & \sum_{s,r} p(s'r|s,a)[r + \gamma v_{*}(s')] \\
+\end{array}$$
+
+## Monte Carlo Methods
+
+We do not know $p(s'r|s,a)$
+
+* generate samples: $S_{0}, A_{0}, R_{1}, S_{1}, A_{1}, R_{2}, ...$
+* obtain averages $\approx$ expected values
+* *generalized policy iteration* to obtain
+
+$$\pi \approx \pi_{*}$$
+
+## Monte Carlo Evaluation
+
+* approx $v_{\pi}(s)$
+
+$$\text{E}_{\pi}[G_{t}|S_{t} = s] \approx \frac{1}{C(s)}\sum_{m=1}^{M}\sum_{\tau=0}^{T_{m}-1} I(s_{\tau}^{m} = s)g_{\tau}^{m}$$
+* **step size** $\alpha$ for update rule
+
+$$V(s_{t}^{m}) \leftarrow V(s_{t}^{m}) + \alpha\left(g_{t}^{m} - V(s_{t}^{m})\right)$$
+
+## Exploration-Exploitation Trade-Off
+
+* to discover optimal policies
+
+> we must explroe all state-action pairs
+
+* to get high returns
+
+> we must exploit known high-value pairs
+
+## Example: Blackjack
+
+![MCMC solving blackjack game](images/MC_blackjack.png)
+
+image credit: [Mutual Information](https://www.youtube.com/watch?v=bpUszPiWM7o&)
+
+> 10 million games played
+
+## Temporal Difference Learning
+
+* **Markov Reward Process**: A Markov decision process, but w/o actions
+
+* MCMC requires an episode to complete before updating
+
+> but what if an episode is long?
+
+## n-step TD
+
+Replace $g_{t}^{m}$ with
+
+$$g_{t:t+n}^{m} = r_{t+1}^{m} + \gamma r_{t+2}^{m} + \cdots + \gamma^{n-1} r_{t+n}^{m} + \gamma_{n}V(s_{t+n}^{m})$$
+
+> updates are applied during the episoes with an n-step delay
+
+## Advantages
+
+Compared to MC, TD has
+
+* batch training
+* $V(s)$ do not depend on stepsize $\alpha$
+* max likelihood of MRP (instead of min MSE)
+
+## Q-Learning
+
+$$r_{t+1}^{m} + \gamma \text{max}_{a} Q(s_{t+1}^{m},a)$$
+
+updates $Q$ after each *sarsa* tuple (each n-step delay)
+
+## Toward Continuity
+
+* previous methods assumed tabular (discrete) and finite state spaces
+* without "infinite data", can we still generalize?
+* **function approximation**: supervised learning + reinforcement learning
+
+## Parameter Space
+
+$$v_{\pi}(s) \approx \hat{v}(s,w), \quad w \in \mathbb{R}^{d}$$
+* caution: updating $w$ updates many values of $s$
+
+> not just the "visited states"
+
+## Value Error
+
+$$\text{VE}(w) = \sum_{s \in S} \mu(s)\left[v_{\pi}(s) - \hat{v}(s,w)\right]^{2}$$
+
+* $\mu$: distribution of states
+* solve with **stochastic gradient descent**
+
+$$w \leftarrow w + \alpha\left[U_{t} - \hat{v}(S_{t},w)\right] \nabla \hat{v}(S_{t},w)$$
-## SLIDE
+## Target Selection
-# SLIDE SECTION
+To find target $U_{t}$
-## SLIDE
+* may have multiple local minima
+* estimates for state values may be biased
+* employ **Semi-Gradient Temporal Difference**
-## SLIDE
diff --git a/slides/06_markov-models.qmd b/slides/06_markov-models.qmd
index d299e2d..d1714fb 100644
--- a/slides/06_markov-models.qmd
+++ b/slides/06_markov-models.qmd
@@ -6,26 +6,45 @@ title: "6. Markov Models"
# Learning objectives
::: nonincremental
-- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY
+- Discuss the Markov Property
+- Introduce MCMC
:::
::: notes
-- You can add notes on each slide with blocks like this!
-- Load a deck in the browser and type "s" to see these notes.
+should talk about this before chapter 5 (reinforcement learning)
:::
-# SLIDE SECTION
+## Tabular State Space
+
+![fairy tale generator](images/state_machine_fairy_tale.png)
+
+image credit: Aja Hammerly
+
+## Trajectories
+
+> once, upon, a, time, a, bird, and, a, mouse
+
+> a, sausage, entered, into, a, partnership, and, set
+
+> bird, a, and, set, up, house, together
+
+## Markov Property
+
+The future of a stochastic process is independent of its past
+
+$$P(X_{t+1} = x|X_{t}, X_{t-1}, ..., X_{t-k}) = P(X_{t+1} = x|X_{t})$$
+* memoryless property
+
+## Metropolis-Hastings
+
+![Metropolis-Hastings Algorithm](images/Metropolis_Hastings.png)
+
+## Markov Chain Monte Carlo
+
+![MCMC to posterior dist](images/MCMC_posterior.png)
-## SLIDE
-- DENOTE MAJOR SECTIONS WITH `# TITLE` (eg `# Installation`)
-- ADD INDIVIDUAL SLIDES WITH `##` (eg `## rustup on Linux/macOS`)
-- KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF.
-## SLIDE
-# SLIDE SECTION
-## SLIDE
-## SLIDE
diff --git a/slides/images/MCMC_posterior.png b/slides/images/MCMC_posterior.png
new file mode 100644
index 0000000..b2a7ae5
Binary files /dev/null and b/slides/images/MCMC_posterior.png differ
diff --git a/slides/images/MC_blackjack.png b/slides/images/MC_blackjack.png
new file mode 100644
index 0000000..6c07b4a
Binary files /dev/null and b/slides/images/MC_blackjack.png differ
diff --git a/slides/images/Markov_decision_process.png b/slides/images/Markov_decision_process.png
new file mode 100644
index 0000000..60a6164
Binary files /dev/null and b/slides/images/Markov_decision_process.png differ
diff --git a/slides/images/Metropolis_Hastings.png b/slides/images/Metropolis_Hastings.png
new file mode 100644
index 0000000..165eae1
Binary files /dev/null and b/slides/images/Metropolis_Hastings.png differ
diff --git a/slides/images/Mutual_Information.png b/slides/images/Mutual_Information.png
new file mode 100644
index 0000000..d825a41
Binary files /dev/null and b/slides/images/Mutual_Information.png differ
diff --git a/slides/images/Sutton_Barto.png b/slides/images/Sutton_Barto.png
new file mode 100644
index 0000000..b0cd65b
Binary files /dev/null and b/slides/images/Sutton_Barto.png differ
diff --git a/slides/images/ch_3_time_series_flowchart.png b/slides/images/ch_3_time_series_flowchart.png
new file mode 100644
index 0000000..f270aa1
Binary files /dev/null and b/slides/images/ch_3_time_series_flowchart.png differ
diff --git a/slides/images/online_convex_optimization.png b/slides/images/online_convex_optimization.png
new file mode 100644
index 0000000..649e24d
Binary files /dev/null and b/slides/images/online_convex_optimization.png differ
diff --git a/slides/images/state_machine_fairy_tale.png b/slides/images/state_machine_fairy_tale.png
new file mode 100644
index 0000000..1c4d803
Binary files /dev/null and b/slides/images/state_machine_fairy_tale.png differ