Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start Adding tutorial for Reinforcement Learning #200

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions _data/navigation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,11 @@ wiki:
url: /wiki/machine-learning/mediapipe-live-ml-anywhere.md/
- title: NLP for robotics
url: /wiki/machine-learning/nlp_for_robotics.md/
- title: Reinforcement Learning
url: /wiki/reinforcemnet-learning
children:
- title: Key Concepts in Reinforcemnet Learning (RL)
url: /wiki/reinforcemnet-learning/key-concepts-in-rl/
- title: State Estimation
url: /wiki/state-estimation/
children:
Expand Down
85 changes: 85 additions & 0 deletions wiki/reinforcement-learning/key-concepts-in-rl.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
date: 2025-03-11 # YYYY-MM-DD
title: Key Concepts of Reinforcement Learning
---

This tutorial provides an introduction to the fundamental concepts of Reinforcement Learning (RL). RL involves an agent interacting with an environment to learn optimal behaviors through trial and feedback. The main objective of RL is to maximize cumulative rewards over time.

## Main Components of Reinforcement Learning

### Agent and Environment
The agent is the learner or decision-maker, while the environment represents everything the agent interacts with. The agent receives observations from the environment and takes actions that influence the environment's state.

### States and Observations
- A **state** (s) fully describes the world at a given moment.
- An **observation** (o) is a partial view of the state.
- Environments can be **fully observed** (complete information) or **partially observed** (limited information).

### Action Spaces
- The **action space** defines all possible actions an agent can take.
- **Discrete action spaces** (e.g., Atari, Go) have a finite number of actions.
- **Continuous action spaces** (e.g., robotics control) allow real-valued actions.

## Policies
A **policy** determines how an agent selects actions based on states:

- **Deterministic policy**: Always selects the same action for a given state.

$a_t = \mu(s_t)$

- **Stochastic policy**: Samples actions from a probability distribution.

$a_t \sim \pi(\cdot | s_t)$


### Example: Deterministic Policy in PyTorch
```python
import torch.nn as nn

pi_net = nn.Sequential(
nn.Linear(obs_dim, 64),
nn.Tanh(),
nn.Linear(64, 64),
nn.Tanh(),
nn.Linear(64, act_dim)
)
```

## Trajectories
A **trajectory (\tau)** is a sequence of states and actions:
```math
\tau = (s_0, a_0, s_1, a_1, ...)
```
State transitions follow deterministic or stochastic rules:
```math
s_{t+1} = f(s_t, a_t)
```
or
```math
s_{t+1} \sim P(\cdot|s_t, a_t)
```

## Reward and Return
The **reward function (R)** determines the agent's objective:
```math
r_t = R(s_t, a_t, s_{t+1})
```
### Types of Return
1. **Finite-horizon undiscounted return**:
```math
R(\tau) = \sum_{t=0}^T r_t
```
2. **Infinite-horizon discounted return**:
```math
R(\tau) = \sum_{t=0}^{\infty} \gamma^t r_t
```
where \( \gamma \) (discount factor) balances immediate vs. future rewards.

## Summary
This tutorial introduced fundamental RL concepts, including agents, environments, policies, action spaces, trajectories, and rewards. These components are essential for designing RL algorithms.

## Further Reading
- Sutton, R. S., & Barto, A. G. (2018). *Reinforcement Learning: An Introduction*.

## References
- [Reinforcement Learning Wikipedia](https://en.wikipedia.org/wiki/Reinforcement_learning)