Skip to content

Commit

Permalink
cohort 1, chapters 4-6 (#4)
Browse files Browse the repository at this point in the history
  • Loading branch information
dsollberger authored Feb 8, 2025
1 parent e0c62b2 commit acb2f6a
Show file tree
Hide file tree
Showing 13 changed files with 232 additions and 62 deletions.
30 changes: 4 additions & 26 deletions slides/03_time-series-analysis.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ title: "3. Time-Series Analysis"
- Mention time series
:::

## Package Libraries

```{r}
#| echo: false
#| eval: false
#| message: false
#| warning: false
Expand All @@ -21,34 +21,12 @@ library("DiagrammeR")

## Stepping Forward

```{r}
#| echo: false
#| eval: true
DiagrammeR::mermaid("
graph LR
TS[time<br />series]
ARIMA[autoregressive<br />models]
TIS[time<br />invariant<br />systems]
CT[control<br />theory]
NSSM[neural<br />state space<br />models]
TS --> ARIMA
ARIMA --> diffusion
ARIMA --> LLMs
diffusion --> SDEs
diffusion --> NSSM
TIS --> NSSM
CT --> NSSM
")
```
![time series knowledge](images/ch_3_time_series_flowchart.png)

## Mermaid code

```{r}
#| echo: true
#| echo: false
#| eval: false
DiagrammeR::mermaid("
graph LR
Expand Down
57 changes: 45 additions & 12 deletions slides/04_online-learning-and-regret-minimization.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -6,26 +6,59 @@ title: "4. Online Learning and Regret Minimization"
# Learning objectives

::: nonincremental
- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY
- Introduce terminology about optimization
:::

::: notes
- You can add notes on each slide with blocks like this!
- Load a deck in the browser and type "s" to see these notes.
I hope that you like math
:::

# SLIDE SECTION
## Online Convex Optimization

![OCO](images/online_convex_optimization.png)

## Regret

$$\text{Regret}_{T}(A) = \text{sup}\left[\sum_{t=1}^{T}f_{t}(x_{t}^{A}) - \text{min}_{x}\sum_{t=1}^{T}f_{t}(x)\right]$$

* $\vec{x}_{t}^{A}$: player actions of an algorithm in a decision set
* $T$: number of game iterations

## Applications

* spam filtering
* path finding
* portfolio selection
* recommendation systems

## Experts and Adversaries

**Theorem 1.2** Let $\epsilon\in(0,0.5)$. Suppose that the best expert makes $L$ mistakes. Then:

* $\exists$ an efficient *deterministic* algorithm $< 2(1+\epsilon)L + \frac{2\log N}{\epsilon}$ mistakes
* $\exists$ an efficient *randomized* algorithm $\leq (1+\epsilon)L + \frac{\log N}{\epsilon}$ mistakes

## Weighted Majority Algorithm

* predict according to *majority* of experts

$$a_{t} = \begin{cases} A, & W_{t}(A) \geq W_{t}(B) \\ B, & \text{otherwise}\end{cases}$$

* *update* weights

$$W_{t+1}(i) = \begin{cases}W_{t}(i), & \text{if expert i was correct} \\ W_{t}(i)(1-\epsilon), & \text{if expert i was wrong}\end{cases}$$

## Hedging

$$W_{t+1}(i) = W_{t}(i)e^{-\epsilon \ell_{t}(i)}$$

* $\epsilon$: learning rate
* $\ell_{t}(i)$: loss by expert $i$ at iteration $t$



## SLIDE

- DENOTE MAJOR SECTIONS WITH `# TITLE` (eg `# Installation`)
- ADD INDIVIDUAL SLIDES WITH `##` (eg `## rustup on Linux/macOS`)
- KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF.

## SLIDE

# SLIDE SECTION

## SLIDE

## SLIDE
164 changes: 152 additions & 12 deletions slides/05_reinforcement-learning.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -6,26 +6,166 @@ title: "5. Reinforcement Learning"
# Learning objectives

::: nonincremental
- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY
- Give an overview of reinforcement learning
:::

::: notes
- You can add notes on each slide with blocks like this!
- Load a deck in the browser and type "s" to see these notes.
should talk about this after chapter 6 (markov models)
:::

# SLIDE SECTION
## Textbook

## SLIDE
:::: {.columns}

- DENOTE MAJOR SECTIONS WITH `# TITLE` (eg `# Installation`)
- ADD INDIVIDUAL SLIDES WITH `##` (eg `## rustup on Linux/macOS`)
- KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF.
::: {.column width="45%"}
![Sutton and Barto](images/Sutton_Barto.png)
:::

::: {.column width="10%"}

:::

::: {.column width="45%"}
![Mutual Information](images/Mutual_Information.png)
:::

::::

## Markov Decision Process

![MDP](images/Markov_decision_process.png)

## Objective

* policy: $\pi(a|s)$
* return: $G_{t} = \sum_{k=t+1}^{T} \gamma^{k-t-1}R_{k}$
* *maximize expected return* over all policies

$$\text{max}_{\pi} \text{E}_{\pi}[G_{t}]$$

## Coupled Equations

* state value function

$$v_{\pi}(s) = \text{E}_{\pi}[G_{t}|S_{t} = s]$$

* action value function

$$q_{\pi}(s,a) = \text{E}_{\pi}[G_{t}|S_{t} = s, A_{t} = a]$$

## Bellman Equations

> connect all state values
$$\begin{array}{rcl}
v_{\pi}(s^{i}) & = & \text{E}_{\pi}[G_{t}|s^{i}] \\
~ & = & \sum_{\{a\}} \pi(a|s^{i}) \cdot q(s^{i},a) \\
~ & = & \sum_{\{a\}} \pi(a|s^{i}) \cdot \text{E}_{\pi}[G_{t}|s^{i}, a] \\
\end{array}$$

## Bellman Optimality Equations

For any optimal $\pi_{*}$, $\forall s \in S$, $\forall a \in A$

$$\begin{array}{rcl}
v_{*}(s) & = & \text{max}_{a} q_{*}(s,a) \\
q_{*}(s,a) & = & \sum_{s,r} p(s'r|s,a)[r + \gamma v_{*}(s')] \\
\end{array}$$

## Monte Carlo Methods

We do not know $p(s'r|s,a)$

* generate samples: $S_{0}, A_{0}, R_{1}, S_{1}, A_{1}, R_{2}, ...$
* obtain averages $\approx$ expected values
* *generalized policy iteration* to obtain

$$\pi \approx \pi_{*}$$

## Monte Carlo Evaluation

* approx $v_{\pi}(s)$

$$\text{E}_{\pi}[G_{t}|S_{t} = s] \approx \frac{1}{C(s)}\sum_{m=1}^{M}\sum_{\tau=0}^{T_{m}-1} I(s_{\tau}^{m} = s)g_{\tau}^{m}$$
* **step size** $\alpha$ for update rule

$$V(s_{t}^{m}) \leftarrow V(s_{t}^{m}) + \alpha\left(g_{t}^{m} - V(s_{t}^{m})\right)$$

## Exploration-Exploitation Trade-Off

* to discover optimal policies

> we must explroe all state-action pairs
* to get high returns

> we must exploit known high-value pairs
## Example: Blackjack

![MCMC solving blackjack game](images/MC_blackjack.png)

image credit: [Mutual Information](https://www.youtube.com/watch?v=bpUszPiWM7o&)

> 10 million games played
## Temporal Difference Learning

* **Markov Reward Process**: A Markov decision process, but w/o actions

* MCMC requires an episode to complete before updating

> but what if an episode is long?
## n-step TD

Replace $g_{t}^{m}$ with

$$g_{t:t+n}^{m} = r_{t+1}^{m} + \gamma r_{t+2}^{m} + \cdots + \gamma^{n-1} r_{t+n}^{m} + \gamma_{n}V(s_{t+n}^{m})$$

> updates are applied during the episoes with an n-step delay
## Advantages

Compared to MC, TD has

* batch training
* $V(s)$ do not depend on stepsize $\alpha$
* max likelihood of MRP (instead of min MSE)

## Q-Learning

$$r_{t+1}^{m} + \gamma \text{max}_{a} Q(s_{t+1}^{m},a)$$

updates $Q$ after each *sarsa* tuple (each n-step delay)

## Toward Continuity

* previous methods assumed tabular (discrete) and finite state spaces
* without "infinite data", can we still generalize?
* **function approximation**: supervised learning + reinforcement learning

## Parameter Space

$$v_{\pi}(s) \approx \hat{v}(s,w), \quad w \in \mathbb{R}^{d}$$
* caution: updating $w$ updates many values of $s$

> not just the "visited states"
## Value Error

$$\text{VE}(w) = \sum_{s \in S} \mu(s)\left[v_{\pi}(s) - \hat{v}(s,w)\right]^{2}$$

* $\mu$: distribution of states
* solve with **stochastic gradient descent**

$$w \leftarrow w + \alpha\left[U_{t} - \hat{v}(S_{t},w)\right] \nabla \hat{v}(S_{t},w)$$

## SLIDE
## Target Selection

# SLIDE SECTION
To find target $U_{t}$

## SLIDE
* may have multiple local minima
* estimates for state values may be biased
* employ **Semi-Gradient Temporal Difference**

## SLIDE
43 changes: 31 additions & 12 deletions slides/06_markov-models.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -6,26 +6,45 @@ title: "6. Markov Models"
# Learning objectives

::: nonincremental
- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY
- Discuss the Markov Property
- Introduce MCMC
:::

::: notes
- You can add notes on each slide with blocks like this!
- Load a deck in the browser and type "s" to see these notes.
should talk about this before chapter 5 (reinforcement learning)
:::

# SLIDE SECTION
## Tabular State Space

![fairy tale generator](images/state_machine_fairy_tale.png)

image credit: Aja Hammerly

## Trajectories

> once, upon, a, time, a, bird, and, a, mouse
> a, sausage, entered, into, a, partnership, and, set
> bird, a, and, set, up, house, together
## Markov Property

The future of a stochastic process is independent of its past

$$P(X_{t+1} = x|X_{t}, X_{t-1}, ..., X_{t-k}) = P(X_{t+1} = x|X_{t})$$
* memoryless property

## Metropolis-Hastings

![Metropolis-Hastings Algorithm](images/Metropolis_Hastings.png)

## Markov Chain Monte Carlo

![MCMC to posterior dist](images/MCMC_posterior.png)

## SLIDE

- DENOTE MAJOR SECTIONS WITH `# TITLE` (eg `# Installation`)
- ADD INDIVIDUAL SLIDES WITH `##` (eg `## rustup on Linux/macOS`)
- KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF.

## SLIDE

# SLIDE SECTION

## SLIDE

## SLIDE
Binary file added slides/images/MCMC_posterior.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added slides/images/MC_blackjack.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added slides/images/Markov_decision_process.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added slides/images/Metropolis_Hastings.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added slides/images/Mutual_Information.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added slides/images/Sutton_Barto.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added slides/images/ch_3_time_series_flowchart.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added slides/images/online_convex_optimization.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added slides/images/state_machine_fairy_tale.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit acb2f6a

Please sign in to comment.