diff --git a/slides/03_time-series-analysis.qmd b/slides/03_time-series-analysis.qmd index 0e0ba42..282f826 100644 --- a/slides/03_time-series-analysis.qmd +++ b/slides/03_time-series-analysis.qmd @@ -9,9 +9,9 @@ title: "3. Time-Series Analysis" - Mention time series ::: -## Package Libraries - ```{r} +#| echo: false +#| eval: false #| message: false #| warning: false @@ -21,34 +21,12 @@ library("DiagrammeR") ## Stepping Forward -```{r} -#| echo: false -#| eval: true -DiagrammeR::mermaid(" -graph LR - -TS[time
series] -ARIMA[autoregressive
models] -TIS[time
invariant
systems] -CT[control
theory] -NSSM[neural
state space
models] - -TS --> ARIMA - -ARIMA --> diffusion -ARIMA --> LLMs - -diffusion --> SDEs -diffusion --> NSSM -TIS --> NSSM -CT --> NSSM -") -``` +![time series knowledge](images/ch_3_time_series_flowchart.png) ## Mermaid code ```{r} -#| echo: true +#| echo: false #| eval: false DiagrammeR::mermaid(" graph LR diff --git a/slides/04_online-learning-and-regret-minimization.qmd b/slides/04_online-learning-and-regret-minimization.qmd index efe0149..c8a6dc1 100644 --- a/slides/04_online-learning-and-regret-minimization.qmd +++ b/slides/04_online-learning-and-regret-minimization.qmd @@ -6,26 +6,59 @@ title: "4. Online Learning and Regret Minimization" # Learning objectives ::: nonincremental -- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY +- Introduce terminology about optimization ::: ::: notes -- You can add notes on each slide with blocks like this! -- Load a deck in the browser and type "s" to see these notes. +I hope that you like math ::: -# SLIDE SECTION +## Online Convex Optimization + +![OCO](images/online_convex_optimization.png) + +## Regret + +$$\text{Regret}_{T}(A) = \text{sup}\left[\sum_{t=1}^{T}f_{t}(x_{t}^{A}) - \text{min}_{x}\sum_{t=1}^{T}f_{t}(x)\right]$$ + +* $\vec{x}_{t}^{A}$: player actions of an algorithm in a decision set +* $T$: number of game iterations + +## Applications + +* spam filtering +* path finding +* portfolio selection +* recommendation systems + +## Experts and Adversaries + +**Theorem 1.2** Let $\epsilon\in(0,0.5)$. Suppose that the best expert makes $L$ mistakes. Then: + +* $\exists$ an efficient *deterministic* algorithm $< 2(1+\epsilon)L + \frac{2\log N}{\epsilon}$ mistakes +* $\exists$ an efficient *randomized* algorithm $\leq (1+\epsilon)L + \frac{\log N}{\epsilon}$ mistakes + +## Weighted Majority Algorithm + +* predict according to *majority* of experts + +$$a_{t} = \begin{cases} A, & W_{t}(A) \geq W_{t}(B) \\ B, & \text{otherwise}\end{cases}$$ + +* *update* weights + +$$W_{t+1}(i) = \begin{cases}W_{t}(i), & \text{if expert i was correct} \\ W_{t}(i)(1-\epsilon), & \text{if expert i was wrong}\end{cases}$$ + +## Hedging + +$$W_{t+1}(i) = W_{t}(i)e^{-\epsilon \ell_{t}(i)}$$ + +* $\epsilon$: learning rate +* $\ell_{t}(i)$: loss by expert $i$ at iteration $t$ + + -## SLIDE -- DENOTE MAJOR SECTIONS WITH `# TITLE` (eg `# Installation`) -- ADD INDIVIDUAL SLIDES WITH `##` (eg `## rustup on Linux/macOS`) -- KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF. -## SLIDE -# SLIDE SECTION -## SLIDE -## SLIDE diff --git a/slides/05_reinforcement-learning.qmd b/slides/05_reinforcement-learning.qmd index abd76fd..9ed6b49 100644 --- a/slides/05_reinforcement-learning.qmd +++ b/slides/05_reinforcement-learning.qmd @@ -6,26 +6,166 @@ title: "5. Reinforcement Learning" # Learning objectives ::: nonincremental -- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY +- Give an overview of reinforcement learning ::: ::: notes -- You can add notes on each slide with blocks like this! -- Load a deck in the browser and type "s" to see these notes. +should talk about this after chapter 6 (markov models) ::: -# SLIDE SECTION +## Textbook -## SLIDE +:::: {.columns} -- DENOTE MAJOR SECTIONS WITH `# TITLE` (eg `# Installation`) -- ADD INDIVIDUAL SLIDES WITH `##` (eg `## rustup on Linux/macOS`) -- KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF. +::: {.column width="45%"} +![Sutton and Barto](images/Sutton_Barto.png) +::: + +::: {.column width="10%"} + +::: + +::: {.column width="45%"} +![Mutual Information](images/Mutual_Information.png) +::: + +:::: + +## Markov Decision Process + +![MDP](images/Markov_decision_process.png) + +## Objective + +* policy: $\pi(a|s)$ +* return: $G_{t} = \sum_{k=t+1}^{T} \gamma^{k-t-1}R_{k}$ +* *maximize expected return* over all policies + +$$\text{max}_{\pi} \text{E}_{\pi}[G_{t}]$$ + +## Coupled Equations + +* state value function + +$$v_{\pi}(s) = \text{E}_{\pi}[G_{t}|S_{t} = s]$$ + +* action value function + +$$q_{\pi}(s,a) = \text{E}_{\pi}[G_{t}|S_{t} = s, A_{t} = a]$$ + +## Bellman Equations + +> connect all state values + +$$\begin{array}{rcl} + v_{\pi}(s^{i}) & = & \text{E}_{\pi}[G_{t}|s^{i}] \\ + ~ & = & \sum_{\{a\}} \pi(a|s^{i}) \cdot q(s^{i},a) \\ + ~ & = & \sum_{\{a\}} \pi(a|s^{i}) \cdot \text{E}_{\pi}[G_{t}|s^{i}, a] \\ +\end{array}$$ + +## Bellman Optimality Equations + +For any optimal $\pi_{*}$, $\forall s \in S$, $\forall a \in A$ + +$$\begin{array}{rcl} + v_{*}(s) & = & \text{max}_{a} q_{*}(s,a) \\ + q_{*}(s,a) & = & \sum_{s,r} p(s'r|s,a)[r + \gamma v_{*}(s')] \\ +\end{array}$$ + +## Monte Carlo Methods + +We do not know $p(s'r|s,a)$ + +* generate samples: $S_{0}, A_{0}, R_{1}, S_{1}, A_{1}, R_{2}, ...$ +* obtain averages $\approx$ expected values +* *generalized policy iteration* to obtain + +$$\pi \approx \pi_{*}$$ + +## Monte Carlo Evaluation + +* approx $v_{\pi}(s)$ + +$$\text{E}_{\pi}[G_{t}|S_{t} = s] \approx \frac{1}{C(s)}\sum_{m=1}^{M}\sum_{\tau=0}^{T_{m}-1} I(s_{\tau}^{m} = s)g_{\tau}^{m}$$ +* **step size** $\alpha$ for update rule + +$$V(s_{t}^{m}) \leftarrow V(s_{t}^{m}) + \alpha\left(g_{t}^{m} - V(s_{t}^{m})\right)$$ + +## Exploration-Exploitation Trade-Off + +* to discover optimal policies + +> we must explroe all state-action pairs + +* to get high returns + +> we must exploit known high-value pairs + +## Example: Blackjack + +![MCMC solving blackjack game](images/MC_blackjack.png) + +image credit: [Mutual Information](https://www.youtube.com/watch?v=bpUszPiWM7o&) + +> 10 million games played + +## Temporal Difference Learning + +* **Markov Reward Process**: A Markov decision process, but w/o actions + +* MCMC requires an episode to complete before updating + +> but what if an episode is long? + +## n-step TD + +Replace $g_{t}^{m}$ with + +$$g_{t:t+n}^{m} = r_{t+1}^{m} + \gamma r_{t+2}^{m} + \cdots + \gamma^{n-1} r_{t+n}^{m} + \gamma_{n}V(s_{t+n}^{m})$$ + +> updates are applied during the episoes with an n-step delay + +## Advantages + +Compared to MC, TD has + +* batch training +* $V(s)$ do not depend on stepsize $\alpha$ +* max likelihood of MRP (instead of min MSE) + +## Q-Learning + +$$r_{t+1}^{m} + \gamma \text{max}_{a} Q(s_{t+1}^{m},a)$$ + +updates $Q$ after each *sarsa* tuple (each n-step delay) + +## Toward Continuity + +* previous methods assumed tabular (discrete) and finite state spaces +* without "infinite data", can we still generalize? +* **function approximation**: supervised learning + reinforcement learning + +## Parameter Space + +$$v_{\pi}(s) \approx \hat{v}(s,w), \quad w \in \mathbb{R}^{d}$$ +* caution: updating $w$ updates many values of $s$ + +> not just the "visited states" + +## Value Error + +$$\text{VE}(w) = \sum_{s \in S} \mu(s)\left[v_{\pi}(s) - \hat{v}(s,w)\right]^{2}$$ + +* $\mu$: distribution of states +* solve with **stochastic gradient descent** + +$$w \leftarrow w + \alpha\left[U_{t} - \hat{v}(S_{t},w)\right] \nabla \hat{v}(S_{t},w)$$ -## SLIDE +## Target Selection -# SLIDE SECTION +To find target $U_{t}$ -## SLIDE +* may have multiple local minima +* estimates for state values may be biased +* employ **Semi-Gradient Temporal Difference** -## SLIDE diff --git a/slides/06_markov-models.qmd b/slides/06_markov-models.qmd index d299e2d..d1714fb 100644 --- a/slides/06_markov-models.qmd +++ b/slides/06_markov-models.qmd @@ -6,26 +6,45 @@ title: "6. Markov Models" # Learning objectives ::: nonincremental -- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY +- Discuss the Markov Property +- Introduce MCMC ::: ::: notes -- You can add notes on each slide with blocks like this! -- Load a deck in the browser and type "s" to see these notes. +should talk about this before chapter 5 (reinforcement learning) ::: -# SLIDE SECTION +## Tabular State Space + +![fairy tale generator](images/state_machine_fairy_tale.png) + +image credit: Aja Hammerly + +## Trajectories + +> once, upon, a, time, a, bird, and, a, mouse + +> a, sausage, entered, into, a, partnership, and, set + +> bird, a, and, set, up, house, together + +## Markov Property + +The future of a stochastic process is independent of its past + +$$P(X_{t+1} = x|X_{t}, X_{t-1}, ..., X_{t-k}) = P(X_{t+1} = x|X_{t})$$ +* memoryless property + +## Metropolis-Hastings + +![Metropolis-Hastings Algorithm](images/Metropolis_Hastings.png) + +## Markov Chain Monte Carlo + +![MCMC to posterior dist](images/MCMC_posterior.png) -## SLIDE -- DENOTE MAJOR SECTIONS WITH `# TITLE` (eg `# Installation`) -- ADD INDIVIDUAL SLIDES WITH `##` (eg `## rustup on Linux/macOS`) -- KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF. -## SLIDE -# SLIDE SECTION -## SLIDE -## SLIDE diff --git a/slides/images/MCMC_posterior.png b/slides/images/MCMC_posterior.png new file mode 100644 index 0000000..b2a7ae5 Binary files /dev/null and b/slides/images/MCMC_posterior.png differ diff --git a/slides/images/MC_blackjack.png b/slides/images/MC_blackjack.png new file mode 100644 index 0000000..6c07b4a Binary files /dev/null and b/slides/images/MC_blackjack.png differ diff --git a/slides/images/Markov_decision_process.png b/slides/images/Markov_decision_process.png new file mode 100644 index 0000000..60a6164 Binary files /dev/null and b/slides/images/Markov_decision_process.png differ diff --git a/slides/images/Metropolis_Hastings.png b/slides/images/Metropolis_Hastings.png new file mode 100644 index 0000000..165eae1 Binary files /dev/null and b/slides/images/Metropolis_Hastings.png differ diff --git a/slides/images/Mutual_Information.png b/slides/images/Mutual_Information.png new file mode 100644 index 0000000..d825a41 Binary files /dev/null and b/slides/images/Mutual_Information.png differ diff --git a/slides/images/Sutton_Barto.png b/slides/images/Sutton_Barto.png new file mode 100644 index 0000000..b0cd65b Binary files /dev/null and b/slides/images/Sutton_Barto.png differ diff --git a/slides/images/ch_3_time_series_flowchart.png b/slides/images/ch_3_time_series_flowchart.png new file mode 100644 index 0000000..f270aa1 Binary files /dev/null and b/slides/images/ch_3_time_series_flowchart.png differ diff --git a/slides/images/online_convex_optimization.png b/slides/images/online_convex_optimization.png new file mode 100644 index 0000000..649e24d Binary files /dev/null and b/slides/images/online_convex_optimization.png differ diff --git a/slides/images/state_machine_fairy_tale.png b/slides/images/state_machine_fairy_tale.png new file mode 100644 index 0000000..1c4d803 Binary files /dev/null and b/slides/images/state_machine_fairy_tale.png differ