From c922969d22fd02cc5fcb3e15c2162500f9fcbe7d Mon Sep 17 00:00:00 2001 From: Shivang Vijay Date: Tue, 11 Mar 2025 01:23:32 -0400 Subject: [PATCH 1/4] Create key-concepts-in-rl --- .../reinforcement-learning/key-concepts-in-rl | 92 +++++++++++++++++++ 1 file changed, 92 insertions(+) create mode 100644 wiki/reinforcement-learning/key-concepts-in-rl diff --git a/wiki/reinforcement-learning/key-concepts-in-rl b/wiki/reinforcement-learning/key-concepts-in-rl new file mode 100644 index 00000000..83c1c787 --- /dev/null +++ b/wiki/reinforcement-learning/key-concepts-in-rl @@ -0,0 +1,92 @@ +--- +# Jekyll 'Front Matter' goes here. Most are set by default, and should NOT be +overwritten except in special circumstances. +# You should set the date the article was last updated like this: +date: 2025-03-11 # YYYY-MM-DD +# This will be displayed at the bottom of the article +# You should set the article's title: +title: Key Concepts of Reinforcement Learning +# The 'title' is automatically displayed at the top of the page +# and used in other parts of the site. +--- + +This tutorial provides an introduction to the fundamental concepts of Reinforcement Learning (RL). RL involves an agent interacting with an environment to learn optimal behaviors through trial and feedback. The main objective of RL is to maximize cumulative rewards over time. + +## Main Components of Reinforcement Learning + +### Agent and Environment +The agent is the learner or decision-maker, while the environment represents everything the agent interacts with. The agent receives observations from the environment and takes actions that influence the environment's state. + +### States and Observations +- A **state** (s) fully describes the world at a given moment. +- An **observation** (o) is a partial view of the state. +- Environments can be **fully observed** (complete information) or **partially observed** (limited information). + +### Action Spaces +- The **action space** defines all possible actions an agent can take. +- **Discrete action spaces** (e.g., Atari, Go) have a finite number of actions. +- **Continuous action spaces** (e.g., robotics control) allow real-valued actions. + +## Policies +A **policy** determines how an agent selects actions based on states: + +- **Deterministic policy**: Always selects the same action for a given state. + ```python + a_t = \mu(s_t) + ``` +- **Stochastic policy**: Samples actions from a probability distribution. + ```python + a_t \sim \pi(\cdot | s_t) + ``` + +### Example: Deterministic Policy in PyTorch +```python +import torch.nn as nn + +pi_net = nn.Sequential( + nn.Linear(obs_dim, 64), + nn.Tanh(), + nn.Linear(64, 64), + nn.Tanh(), + nn.Linear(64, act_dim) +) +``` + +## Trajectories +A **trajectory (\tau)** is a sequence of states and actions: +```math +\tau = (s_0, a_0, s_1, a_1, ...) +``` +State transitions follow deterministic or stochastic rules: +```math +s_{t+1} = f(s_t, a_t) +``` +or +```math +s_{t+1} \sim P(\cdot|s_t, a_t) +``` + +## Reward and Return +The **reward function (R)** determines the agent's objective: +```math +r_t = R(s_t, a_t, s_{t+1}) +``` +### Types of Return +1. **Finite-horizon undiscounted return**: + ```math + R(\tau) = \sum_{t=0}^T r_t + ``` +2. **Infinite-horizon discounted return**: + ```math + R(\tau) = \sum_{t=0}^{\infty} \gamma^t r_t + ``` + where \( \gamma \) (discount factor) balances immediate vs. future rewards. + +## Summary +This tutorial introduced fundamental RL concepts, including agents, environments, policies, action spaces, trajectories, and rewards. These components are essential for designing RL algorithms. + +## Further Reading +- Sutton, R. S., & Barto, A. G. (2018). *Reinforcement Learning: An Introduction*. + +## References +- [Reinforcement Learning Wikipedia](https://en.wikipedia.org/wiki/Reinforcement_learning) From e4c7ec22575224b44769c4f162f4394ac6fb0e98 Mon Sep 17 00:00:00 2001 From: Shivang Vijay Date: Tue, 11 Mar 2025 01:24:38 -0400 Subject: [PATCH 2/4] Rename key-concepts-in-rl to key-concepts-in-rl.md --- .../{key-concepts-in-rl => key-concepts-in-rl.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename wiki/reinforcement-learning/{key-concepts-in-rl => key-concepts-in-rl.md} (100%) diff --git a/wiki/reinforcement-learning/key-concepts-in-rl b/wiki/reinforcement-learning/key-concepts-in-rl.md similarity index 100% rename from wiki/reinforcement-learning/key-concepts-in-rl rename to wiki/reinforcement-learning/key-concepts-in-rl.md From 5d9d56ed89e15863238a03954361ec9a24ade1d1 Mon Sep 17 00:00:00 2001 From: Shivang Vijay Date: Tue, 11 Mar 2025 01:35:08 -0400 Subject: [PATCH 3/4] Update key-concepts-in-rl.md --- .../key-concepts-in-rl.md | 21 +++++++------------ 1 file changed, 7 insertions(+), 14 deletions(-) diff --git a/wiki/reinforcement-learning/key-concepts-in-rl.md b/wiki/reinforcement-learning/key-concepts-in-rl.md index 83c1c787..6335929e 100644 --- a/wiki/reinforcement-learning/key-concepts-in-rl.md +++ b/wiki/reinforcement-learning/key-concepts-in-rl.md @@ -1,13 +1,6 @@ --- -# Jekyll 'Front Matter' goes here. Most are set by default, and should NOT be -overwritten except in special circumstances. -# You should set the date the article was last updated like this: -date: 2025-03-11 # YYYY-MM-DD -# This will be displayed at the bottom of the article -# You should set the article's title: +date: 2025-03-11 # YYYY-MM-DD title: Key Concepts of Reinforcement Learning -# The 'title' is automatically displayed at the top of the page -# and used in other parts of the site. --- This tutorial provides an introduction to the fundamental concepts of Reinforcement Learning (RL). RL involves an agent interacting with an environment to learn optimal behaviors through trial and feedback. The main objective of RL is to maximize cumulative rewards over time. @@ -31,13 +24,13 @@ The agent is the learner or decision-maker, while the environment represents eve A **policy** determines how an agent selects actions based on states: - **Deterministic policy**: Always selects the same action for a given state. - ```python - a_t = \mu(s_t) - ``` + + $a_t = \mu(s_t)$ + - **Stochastic policy**: Samples actions from a probability distribution. - ```python - a_t \sim \pi(\cdot | s_t) - ``` + + $a_t \sim \pi(\cdot | s_t)$ + ### Example: Deterministic Policy in PyTorch ```python From 231624c165c963c5f0a1a720488ec14c728f3575 Mon Sep 17 00:00:00 2001 From: Shivang Vijay Date: Tue, 11 Mar 2025 01:42:11 -0400 Subject: [PATCH 4/4] Added link of RL --- _data/navigation.yml | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/_data/navigation.yml b/_data/navigation.yml index 6600d2a4..6c7408d9 100644 --- a/_data/navigation.yml +++ b/_data/navigation.yml @@ -181,6 +181,11 @@ wiki: url: /wiki/machine-learning/mediapipe-live-ml-anywhere.md/ - title: NLP for robotics url: /wiki/machine-learning/nlp_for_robotics.md/ + - title: Reinforcement Learning + url: /wiki/reinforcemnet-learning + children: + - title: Key Concepts in Reinforcemnet Learning (RL) + url: /wiki/reinforcemnet-learning/key-concepts-in-rl/ - title: State Estimation url: /wiki/state-estimation/ children: