cohort 1, chapters 4-6 (#4)

r4ds · Feb 8, 2025 · acb2f6a · acb2f6a
1 parent e0c62b2
commit acb2f6a
Show file tree

Hide file tree

Showing 13 changed files with 232 additions and 62 deletions.
diff --git a/slides/03_time-series-analysis.qmd b/slides/03_time-series-analysis.qmd
@@ -9,9 +9,9 @@ title: "3. Time-Series Analysis"
 - Mention time series
 :::
 
-## Package Libraries
-
 ```{r}
+#| echo: false
+#| eval: false
 #| message: false
 #| warning: false
 
@@ -21,34 +21,12 @@ library("DiagrammeR")
 
 ## Stepping Forward
 
-```{r}
-#| echo: false
-#| eval: true
-DiagrammeR::mermaid("
-graph LR
-
-TS[time<br />series]
-ARIMA[autoregressive<br />models]
-TIS[time<br />invariant<br />systems]
-CT[control<br />theory]
-NSSM[neural<br />state space<br />models]
-
-TS --> ARIMA
-
-ARIMA --> diffusion
-ARIMA --> LLMs
-
-diffusion --> SDEs
-diffusion --> NSSM
-TIS --> NSSM
-CT --> NSSM
-")
-```
+![time series knowledge](images/ch_3_time_series_flowchart.png)
 
 ## Mermaid code
 
 ```{r}
-#| echo: true
+#| echo: false
 #| eval: false
 DiagrammeR::mermaid("
 graph LR

diff --git a/slides/04_online-learning-and-regret-minimization.qmd b/slides/04_online-learning-and-regret-minimization.qmd
@@ -6,26 +6,59 @@ title: "4. Online Learning and Regret Minimization"
 # Learning objectives
 
 ::: nonincremental
-- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY
+- Introduce terminology about optimization
 :::
 
 ::: notes
-- You can add notes on each slide with blocks like this!
-- Load a deck in the browser and type "s" to see these notes.
+I hope that you like math
 :::
 
-# SLIDE SECTION
+## Online Convex Optimization
+
+![OCO](images/online_convex_optimization.png)
+
+## Regret
+
+$$\text{Regret}_{T}(A) = \text{sup}\left[\sum_{t=1}^{T}f_{t}(x_{t}^{A}) - \text{min}_{x}\sum_{t=1}^{T}f_{t}(x)\right]$$
+
+* $\vec{x}_{t}^{A}$: player actions of an algorithm in a decision set
+* $T$: number of game iterations
+
+## Applications
+
+* spam filtering
+* path finding
+* portfolio selection
+* recommendation systems
+
+## Experts and Adversaries
+
+**Theorem 1.2** Let $\epsilon\in(0,0.5)$.  Suppose that the best expert makes $L$ mistakes.  Then:
+
+* $\exists$ an efficient *deterministic* algorithm $< 2(1+\epsilon)L + \frac{2\log N}{\epsilon}$ mistakes
+* $\exists$ an efficient *randomized* algorithm $\leq (1+\epsilon)L + \frac{\log N}{\epsilon}$ mistakes
+
+## Weighted Majority Algorithm
+
+* predict according to *majority* of experts
+
+$$a_{t} = \begin{cases} A, & W_{t}(A) \geq W_{t}(B) \\ B, & \text{otherwise}\end{cases}$$
+
+* *update* weights
+
+$$W_{t+1}(i) = \begin{cases}W_{t}(i), & \text{if expert i was correct} \\ W_{t}(i)(1-\epsilon), & \text{if expert i was wrong}\end{cases}$$
+
+## Hedging
+
+$$W_{t+1}(i) = W_{t}(i)e^{-\epsilon \ell_{t}(i)}$$
+
+* $\epsilon$: learning rate
+* $\ell_{t}(i)$: loss by expert $i$ at iteration $t$
+
+
 
-## SLIDE
 
-- DENOTE MAJOR SECTIONS WITH `# TITLE` (eg `# Installation`)
-- ADD INDIVIDUAL SLIDES WITH `##` (eg `## rustup on Linux/macOS`)
-- KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF.
 
-## SLIDE
 
-# SLIDE SECTION
 
-## SLIDE
 
-## SLIDE
diff --git a/slides/05_reinforcement-learning.qmd b/slides/05_reinforcement-learning.qmd
@@ -6,26 +6,166 @@ title: "5. Reinforcement Learning"
 # Learning objectives
 
 ::: nonincremental
-- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY
+- Give an overview of reinforcement learning
 :::
 
 ::: notes
-- You can add notes on each slide with blocks like this!
-- Load a deck in the browser and type "s" to see these notes.
+should talk about this after chapter 6 (markov models)
 :::
 
-# SLIDE SECTION
+## Textbook
 
-## SLIDE
+:::: {.columns}
 
-- DENOTE MAJOR SECTIONS WITH `# TITLE` (eg `# Installation`)
-- ADD INDIVIDUAL SLIDES WITH `##` (eg `## rustup on Linux/macOS`)
-- KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF.
+::: {.column width="45%"}
+![Sutton and Barto](images/Sutton_Barto.png)
+:::
+
+::: {.column width="10%"}
+
+:::
+
+::: {.column width="45%"}
+![Mutual Information](images/Mutual_Information.png)
+:::
+
+::::
+
+## Markov Decision Process
+
+![MDP](images/Markov_decision_process.png)
+
+## Objective
+
+* policy: $\pi(a|s)$
+* return: $G_{t} = \sum_{k=t+1}^{T} \gamma^{k-t-1}R_{k}$
+* *maximize expected return* over all policies
+
+$$\text{max}_{\pi} \text{E}_{\pi}[G_{t}]$$
+
+## Coupled Equations
+
+* state value function
+
+$$v_{\pi}(s) = \text{E}_{\pi}[G_{t}|S_{t} = s]$$
+
+* action value function
+
+$$q_{\pi}(s,a) = \text{E}_{\pi}[G_{t}|S_{t} = s, A_{t} = a]$$
+
+## Bellman Equations
+
+> connect all state values
+
+$$\begin{array}{rcl}
+  v_{\pi}(s^{i}) & = & \text{E}_{\pi}[G_{t}|s^{i}] \\
+  ~ & = & \sum_{\{a\}} \pi(a|s^{i}) \cdot q(s^{i},a) \\
+  ~ & = & \sum_{\{a\}} \pi(a|s^{i}) \cdot \text{E}_{\pi}[G_{t}|s^{i}, a] \\
+\end{array}$$
+
+## Bellman Optimality Equations
+
+For any optimal $\pi_{*}$, $\forall s \in S$, $\forall a \in A$
+
+$$\begin{array}{rcl}
+  v_{*}(s) & = & \text{max}_{a} q_{*}(s,a) \\
+  q_{*}(s,a) & = & \sum_{s,r} p(s'r|s,a)[r + \gamma v_{*}(s')] \\
+\end{array}$$
+
+## Monte Carlo Methods
+
+We do not know $p(s'r|s,a)$
+
+* generate samples: $S_{0}, A_{0}, R_{1}, S_{1}, A_{1}, R_{2}, ...$
+* obtain averages $\approx$ expected values
+* *generalized policy iteration* to obtain 
+
+$$\pi \approx \pi_{*}$$
+
+## Monte Carlo Evaluation
+
+* approx $v_{\pi}(s)$
+
+$$\text{E}_{\pi}[G_{t}|S_{t} = s] \approx \frac{1}{C(s)}\sum_{m=1}^{M}\sum_{\tau=0}^{T_{m}-1} I(s_{\tau}^{m} = s)g_{\tau}^{m}$$
+* **step size** $\alpha$ for update rule
+
+$$V(s_{t}^{m}) \leftarrow V(s_{t}^{m}) + \alpha\left(g_{t}^{m} - V(s_{t}^{m})\right)$$
+
+## Exploration-Exploitation Trade-Off
+
+* to discover optimal policies
+
+> we must explroe all state-action pairs
+
+* to get high returns
+
+> we must exploit known high-value pairs
+
+## Example: Blackjack
+
+![MCMC solving blackjack game](images/MC_blackjack.png)
+
+image credit: [Mutual Information](https://www.youtube.com/watch?v=bpUszPiWM7o&)
+
+> 10 million games played
+
+## Temporal Difference Learning
+
+* **Markov Reward Process**: A Markov decision process, but w/o actions
+
+* MCMC requires an episode to complete before updating
+
+> but what if an episode is long?
+
+## n-step TD
+
+Replace $g_{t}^{m}$ with
+
+$$g_{t:t+n}^{m} = r_{t+1}^{m} + \gamma r_{t+2}^{m} + \cdots + \gamma^{n-1} r_{t+n}^{m} + \gamma_{n}V(s_{t+n}^{m})$$
+
+> updates are applied during the episoes with an n-step delay
+
+## Advantages
+
+Compared to MC, TD has
+
+* batch training
+* $V(s)$ do not depend on stepsize $\alpha$
+* max likelihood of MRP (instead of min MSE)
+
+## Q-Learning
+
+$$r_{t+1}^{m} + \gamma \text{max}_{a} Q(s_{t+1}^{m},a)$$
+
+updates $Q$ after each *sarsa* tuple (each n-step delay)
+
+## Toward Continuity
+
+* previous methods assumed tabular (discrete) and finite state spaces
+* without "infinite data", can we still generalize?
+* **function approximation**: supervised learning + reinforcement learning
+
+## Parameter Space
+
+$$v_{\pi}(s) \approx \hat{v}(s,w), \quad w \in \mathbb{R}^{d}$$
+* caution: updating $w$ updates many values of $s$
+
+> not just the "visited states"
+
+## Value Error
+
+$$\text{VE}(w) = \sum_{s \in S} \mu(s)\left[v_{\pi}(s) - \hat{v}(s,w)\right]^{2}$$
+
+* $\mu$: distribution of states
+* solve with **stochastic gradient descent**
+
+$$w \leftarrow w + \alpha\left[U_{t} - \hat{v}(S_{t},w)\right] \nabla \hat{v}(S_{t},w)$$
 
-## SLIDE
+## Target Selection
 
-# SLIDE SECTION
+To find target $U_{t}$
 
-## SLIDE
+* may have multiple local minima
+* estimates for state values may be biased
+* employ **Semi-Gradient Temporal Difference**
 
-## SLIDE
diff --git a/slides/06_markov-models.qmd b/slides/06_markov-models.qmd
@@ -6,26 +6,45 @@ title: "6. Markov Models"
 # Learning objectives
 
 ::: nonincremental
-- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY
+- Discuss the Markov Property
+- Introduce MCMC
 :::
 
 ::: notes
-- You can add notes on each slide with blocks like this!
-- Load a deck in the browser and type "s" to see these notes.
+should talk about this before chapter 5 (reinforcement learning)
 :::
 
-# SLIDE SECTION
+## Tabular State Space
+
+![fairy tale generator](images/state_machine_fairy_tale.png)
+
+image credit: Aja Hammerly
+
+## Trajectories
+
+> once, upon, a, time, a, bird, and, a, mouse
+
+> a, sausage, entered, into, a, partnership, and, set
+
+> bird, a, and, set, up, house, together
+
+## Markov Property
+
+The future of a stochastic process is independent of its past
+
+$$P(X_{t+1} = x|X_{t}, X_{t-1}, ..., X_{t-k}) = P(X_{t+1} = x|X_{t})$$
+* memoryless property
+
+## Metropolis-Hastings
+
+![Metropolis-Hastings Algorithm](images/Metropolis_Hastings.png)
+
+## Markov Chain Monte Carlo
+
+![MCMC to posterior dist](images/MCMC_posterior.png)
 
-## SLIDE
 
-- DENOTE MAJOR SECTIONS WITH `# TITLE` (eg `# Installation`)
-- ADD INDIVIDUAL SLIDES WITH `##` (eg `## rustup on Linux/macOS`)
-- KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF.
 
-## SLIDE
 
-# SLIDE SECTION
 
-## SLIDE
 
-## SLIDE
diff --git a/slides/images/MCMC_posterior.png b/slides/images/MCMC_posterior.png
diff --git a/slides/images/MC_blackjack.png b/slides/images/MC_blackjack.png
diff --git a/slides/images/Markov_decision_process.png b/slides/images/Markov_decision_process.png
diff --git a/slides/images/Metropolis_Hastings.png b/slides/images/Metropolis_Hastings.png
diff --git a/slides/images/Mutual_Information.png b/slides/images/Mutual_Information.png
diff --git a/slides/images/Sutton_Barto.png b/slides/images/Sutton_Barto.png
diff --git a/slides/images/ch_3_time_series_flowchart.png b/slides/images/ch_3_time_series_flowchart.png
diff --git a/slides/images/online_convex_optimization.png b/slides/images/online_convex_optimization.png
diff --git a/slides/images/state_machine_fairy_tale.png b/slides/images/state_machine_fairy_tale.png