Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 1.4.0 #260

Merged
merged 33 commits into from
Jan 16, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
bacb0f5
Mixed double precision for PPO algorithm (#155)
lopatovsky Sep 26, 2024
f585796
Call agent's pre-interaction during evaluation (#210)
Toni-SM Oct 12, 2024
934fbaa
Space-tensor conversion (#206)
Toni-SM Oct 18, 2024
7dc161e
Remove OpenAI Gym (gym) from dependencies and source code (#220)
Toni-SM Nov 2, 2024
ae4e09e
Merge branch 'main' into develop
Toni-SM Nov 2, 2024
5fce807
Fix: with SAC, a new training batch should be sampled for each gradie…
YiboDi Nov 3, 2024
9252ec9
Fix Sampling inside gradient loop issue (#183)
bekleyis95 Nov 3, 2024
eff7295
Add class mapping of categorical model (#216)
Telios Nov 3, 2024
88ac11f
Update pre-commit hooks (#221)
Toni-SM Nov 4, 2024
4db2956
Apply black and codespell pre-commit hooks (#222)
Toni-SM Nov 5, 2024
bbe532d
Docs update (#228)
Toni-SM Nov 23, 2024
6324e46
Improve model instantiators (#232)
Toni-SM Dec 2, 2024
57f60df
Disable torch distribution argument validation to improve performance…
Toni-SM Dec 2, 2024
87250fa
Use ML framework specific device parsing in source code (#234)
Toni-SM Dec 6, 2024
3d4eb76
Replace PyTorch's BatchSampler by Python slice when sampling data fro…
Toni-SM Dec 8, 2024
d2aee9f
Update JAX installation warning note (#241)
Toni-SM Dec 19, 2024
23b61dc
Shared model instantiator's default parameters (#242)
Toni-SM Dec 21, 2024
2882db4
Automatic mixed precision training in PyTorch (#243)
Toni-SM Jan 1, 2025
f39aadc
Fix SAC experiment directory key name (#244)
Toni-SM Jan 1, 2025
e49f98f
Fix Optax's learning rate schedulers integration in JAX (#245)
Toni-SM Jan 5, 2025
a7f82b2
Isaac Lab wrapper's multi-agent state retrieval with gymnasium 1.0
Toni-SM Jan 5, 2025
65da82b
Add method to initialize lazy modules' parameters
Toni-SM Jan 5, 2025
deb28ff
Update examples that use model instantiators to the latest API and In…
Toni-SM Jan 7, 2025
95663f2
Update environment loader and wrapper for Isaac Lab 2.0 (#248)
Toni-SM Jan 7, 2025
e5c6b81
Update Gymnasium checking for vectorized environments (#250)
Toni-SM Jan 8, 2025
95b0e02
Update AMP agent to use the environment's terminated and truncated da…
Toni-SM Jan 15, 2025
16224b0
Update runner implementations to support definition of arbitrary agen…
Toni-SM Jan 15, 2025
9bda6c5
Fix multi-agent learning rate scheduler in JAX (#255)
Toni-SM Jan 15, 2025
d11a020
Add automatic mixed precision support for multi-agent and deal torch.…
Toni-SM Jan 15, 2025
cb21eba
Update gymnasium make vector API (#257)
Toni-SM Jan 16, 2025
7f9992b
Fix memory sampling when sequence_length is specified
Toni-SM Jan 16, 2025
663f546
Treat truncation signal when computing 'done' (environment reset) (#259)
Toni-SM Jan 16, 2025
fdfb8a5
Allow the use of the deterministic/stochastic actions during evaluati…
Toni-SM Jan 16, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 38 additions & 11 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,15 +1,42 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-ast
- id: check-case-conflict
- id: check-docstring-first
- id: check-merge-conflict
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
rev: v4.6.0
hooks:
- id: check-ast
- id: check-case-conflict
- id: check-docstring-first
- id: check-json
- id: check-merge-conflict
- id: check-toml
- id: check-yaml
- id: debug-statements
- id: detect-private-key
- id: end-of-file-fixer
- id: name-tests-test
args: ["--pytest-test-first"]
exclude: ^(tests/strategies.py|tests/utils.py)
- id: no-commit-to-branch
- id: trailing-whitespace
- repo: https://github.com/codespell-project/codespell
rev: v2.3.0
hooks:
- id: codespell
exclude: ^(docs/source/_static|docs/_build|pyproject.toml)
additional_dependencies:
- tomli
- repo: https://github.com/python/black
rev: 24.8.0
hooks:
- id: black
args: ["--line-length=120"]
exclude: ^(docs/)
- repo: https://github.com/pycqa/isort
rev: 5.12.0
rev: 5.13.2
hooks:
- id: isort
- repo: https://github.com/pre-commit/pygrep-hooks
rev: v1.10.0
hooks:
- id: isort
- id: rst-backticks
- id: rst-directive-colons
- id: rst-inline-touching-normal
47 changes: 44 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,47 @@

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## [1.4.0] - Unreleased
### Added
- Utilities to operate on Gymnasium spaces (`Box`, `Discrete`, `MultiDiscrete`, `Tuple` and `Dict`)
- `parse_device` static method in ML framework configuration (used in library components to set up the device)
- Model instantiator support for different shared model structures in PyTorch
- Support for automatic mixed precision training in PyTorch
- `init_state_dict` method to initialize model's lazy modules in PyTorch
- Model instantiators `fixed_log_std` parameter to define immutable log standard deviations
- Define the `stochastic_evaluation` trainer config to allow the use of the actions returned by the agent's model
as-is instead of deterministic actions (mean-actions in Gaussian-based models) during evaluation.
Make the return of deterministic actions the default behavior.

### Changed
- Call agent's `pre_interaction` method during evaluation
- Use spaces utilities to process states, observations and actions for all the library components
- Update model instantiators definitions to process supported fundamental and composite Gymnasium spaces
- Make flattened tensor storage in memory the default option (revert changed introduced in version 1.3.0)
- Drop support for PyTorch versions prior to 1.10 (the previous supported version was 1.9)
- Update KL Adaptive learning rate scheduler implementation to match Optax's behavior in JAX
- Update AMP agent to use the environment's terminated and truncated data, and the KL Adaptive learning rate scheduler
- Update runner implementations to support definition of arbitrary agents and their models
- Speed up PyTorch implementation:
- Disable argument checking when instantiating distributions
- Replace PyTorch's `BatchSampler` by Python slice when sampling data from memory

### Changed (breaking changes: style)
- Format code using Black code formatter (it's ugly, yes, but it does its job)

### Fixed
- Move the batch sampling inside gradient step loop for DQN, DDQN, DDPG (RNN), TD3 (RNN), SAC and SAC (RNN)
- Model state dictionary initialization for composite Gymnasium spaces in JAX
- Add missing `reduction` parameter to Gaussian model instantiator
- Optax's learning rate schedulers integration in JAX implementation
- Isaac Lab wrapper's multi-agent state retrieval with gymnasium 1.0
- Treat truncation signal when computing 'done' (environment reset)

### Removed
- Remove OpenAI Gym (`gym`) from dependencies and source code. **skrl** continues to support gym environments,
it is just not installed as part of the library. If it is needed, it needs to be installed manually.
Any gym-based environment wrapper must use the `convert_gym_space` space utility to operate

## [1.3.0] - 2024-09-11
### Added
- Distributed multi-GPU and multi-node learning (JAX implementation)
Expand Down Expand Up @@ -70,7 +111,7 @@ Summary of the most relevant features:
## [1.0.0-rc.2] - 2023-08-11
### Added
- Get truncation from `time_outs` info in Isaac Gym, Isaac Orbit and Omniverse Isaac Gym environments
- Time-limit (truncation) boostrapping in on-policy actor-critic agents
- Time-limit (truncation) bootstrapping in on-policy actor-critic agents
- Model instantiators `initial_log_std` parameter to set the log standard deviation's initial value

### Changed (breaking changes)
Expand All @@ -84,7 +125,7 @@ Summary of the most relevant features:
- `from skrl.envs.loaders.jax import load_omniverse_isaacgym_env`

### Changed
- Drop support for versions prior to PyTorch 1.9 (1.8.0 and 1.8.1)
- Drop support for PyTorch versions prior to 1.9 (the previous supported version was 1.8)

## [1.0.0-rc.1] - 2023-07-25
### Added
Expand Down Expand Up @@ -177,7 +218,7 @@ to allow storing samples in memories during evaluation
- Parameter `role` to model methods
- Wrapper compatibility with the new OpenAI Gym environment API
- Internal library colored logger
- Migrate checkpoints/models from other RL libraries to skrl models/agents
- Migrate checkpoints/models from other RL libraries to **skrl** models/agents
- Configuration parameter `store_separately` to agent configuration dict
- Save/load agent modules (models, optimizers, preprocessors)
- Set random seed and configure deterministic behavior for reproducibility
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Read the code a little bit and you will understand it at first glance... Also
```ini
function annotation (e.g. typing)
# insert an empty line
python libraries and other libraries (e.g. gym, numpy, time, etc.)
python libraries and other libraries (e.g. gymnasium, numpy, time, etc.)
# insert an empty line
machine learning framework modules (e.g. torch, torch.nn)
# insert an empty line
Expand Down
2 changes: 0 additions & 2 deletions docs/source/api/agents.rst
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,6 @@ API (PyTorch)
:private-members: _update, _empty_preprocessor, _get_internal_value
:members:

.. automethod:: __init__
.. automethod:: __str__

.. raw:: html
Expand All @@ -136,5 +135,4 @@ API (JAX)
:private-members: _update, _empty_preprocessor, _get_internal_value
:members:

.. automethod:: __init__
.. automethod:: __str__
14 changes: 6 additions & 8 deletions docs/source/api/agents/a2c.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Algorithm implementation

| Main notation/symbols:
| - policy function approximator (:math:`\pi_\theta`), value function approximator (:math:`V_\phi`)
| - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), dones (:math:`d`)
| - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), terminated (:math:`d_{_{end}}`), truncated (:math:`d_{_{timeout}}`)
| - values (:math:`V`), advantages (:math:`A`), returns (:math:`R`)
| - log probabilities (:math:`logp`)
| - loss (:math:`L`)
Expand Down Expand Up @@ -59,7 +59,7 @@ Learning algorithm
| :literal:`_update(...)`
| :green:`# compute returns and advantages`
| :math:`V_{_{last}}' \leftarrow V_\phi(s')`
| :math:`R, A \leftarrow f_{GAE}(r, d, V, V_{_{last}}')`
| :math:`R, A \leftarrow f_{GAE}(r, d_{_{end}} \lor d_{_{timeout}}, V, V_{_{last}}')`
| :green:`# sample mini-batches from memory`
| [[:math:`s, a, logp, V, R, A`]] :math:`\leftarrow` states, actions, log_prob, values, returns, advantages
| :green:`# mini-batches loop`
Expand Down Expand Up @@ -232,6 +232,10 @@ Support for advanced features is described in the next table
- RNN, LSTM, GRU and any other variant
- .. centered:: :math:`\blacksquare`
- .. centered:: :math:`\square`
* - Mixed precision
- Automatic mixed precision
- .. centered:: :math:`\blacksquare`
- .. centered:: :math:`\square`
* - Distributed
- Single Program Multi Data (SPMD) multi-GPU
- .. centered:: :math:`\blacksquare`
Expand All @@ -252,16 +256,12 @@ API (PyTorch)
:private-members: _update
:members:

.. automethod:: __init__

.. autoclass:: skrl.agents.torch.a2c.A2C_RNN
:undoc-members:
:show-inheritance:
:private-members: _update
:members:

.. automethod:: __init__

.. raw:: html

<br>
Expand All @@ -276,5 +276,3 @@ API (JAX)
:show-inheritance:
:private-members: _update
:members:

.. automethod:: __init__
10 changes: 6 additions & 4 deletions docs/source/api/agents/amp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Algorithm implementation

| Main notation/symbols:
| - policy (:math:`\pi_\theta`), value (:math:`V_\phi`) and discriminator (:math:`D_\psi`) function approximators
| - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), dones (:math:`d`)
| - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), terminated (:math:`d_{_{end}}`), truncated (:math:`d_{_{timeout}}`)
| - values (:math:`V`), next values (:math:`V'`), advantages (:math:`A`), returns (:math:`R`)
| - log probabilities (:math:`logp`)
| - loss (:math:`L`)
Expand Down Expand Up @@ -57,7 +57,7 @@ Learning algorithm
| :math:`r_D \leftarrow -log(\text{max}( 1 - \hat{y}(D_\psi(s_{_{AMP}})), \, 10^{-4})) \qquad` with :math:`\; \hat{y}(x) = \dfrac{1}{1 + e^{-x}}`
| :math:`r' \leftarrow` :guilabel:`task_reward_weight` :math:`r \, +` :guilabel:`style_reward_weight` :guilabel:`discriminator_reward_scale` :math:`r_D`
| :green:`# compute returns and advantages`
| :math:`R, A \leftarrow f_{GAE}(r', d, V, V')`
| :math:`R, A \leftarrow f_{GAE}(r', d_{_{end}} \lor d_{_{timeout}}, V, V')`
| :green:`# sample mini-batches from memory`
| [[:math:`s, a, logp, V, R, A, s_{_{AMP}}`]] :math:`\leftarrow` states, actions, log_prob, values, returns, advantages, AMP states
| [[:math:`s_{_{AMP}}^{^M}`]] :math:`\leftarrow` AMP states from :math:`M`
Expand Down Expand Up @@ -237,6 +237,10 @@ Support for advanced features is described in the next table
- \-
- .. centered:: :math:`\square`
- .. centered:: :math:`\square`
* - Mixed precision
- Automatic mixed precision
- .. centered:: :math:`\blacksquare`
- .. centered:: :math:`\square`
* - Distributed
- Single Program Multi Data (SPMD) multi-GPU
- .. centered:: :math:`\blacksquare`
Expand All @@ -256,5 +260,3 @@ API (PyTorch)
:show-inheritance:
:private-members: _update
:members:

.. automethod:: __init__
12 changes: 6 additions & 6 deletions docs/source/api/agents/cem.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Algorithm implementation

| Main notation/symbols:
| - policy function approximator (:math:`\pi_\theta`)
| - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), dones (:math:`d`)
| - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), terminated (:math:`d_{_{end}}`), truncated (:math:`d_{_{timeout}}`)
| - loss (:math:`L`)

.. raw:: html
Expand All @@ -41,7 +41,7 @@ Learning algorithm
|
| :literal:`_update(...)`
| :green:`# sample all memory`
| :math:`s, a, r, s', d \leftarrow` states, actions, rewards, next_states, dones
| :math:`s, a, r \leftarrow` states, actions, rewards
| :green:`# compute discounted return threshold`
| :math:`[G] \leftarrow \sum_{t=0}^{E-1}` :guilabel:`discount_factor`:math:`^{t} \, r_t` for each episode
| :math:`G_{_{bound}} \leftarrow q_{th_{quantile}}([G])` at the given :guilabel:`percentile`
Expand Down Expand Up @@ -175,6 +175,10 @@ Support for advanced features is described in the next table
- \-
- .. centered:: :math:`\square`
- .. centered:: :math:`\square`
* - Mixed precision
- Automatic mixed precision
- .. centered:: :math:`\blacksquare`
- .. centered:: :math:`\square`
* - Distributed
- \-
- .. centered:: :math:`\square`
Expand All @@ -195,8 +199,6 @@ API (PyTorch)
:private-members: _update
:members:

.. automethod:: __init__

.. raw:: html

<br>
Expand All @@ -211,5 +213,3 @@ API (JAX)
:show-inheritance:
:private-members: _update
:members:

.. automethod:: __init__
16 changes: 7 additions & 9 deletions docs/source/api/agents/ddpg.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Algorithm implementation

| Main notation/symbols:
| - policy function approximator (:math:`\mu_\theta`), critic function approximator (:math:`Q_\phi`)
| - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), dones (:math:`d`)
| - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), terminated (:math:`d_{_{end}}`), truncated (:math:`d_{_{timeout}}`)
| - loss (:math:`L`)

.. raw:: html
Expand Down Expand Up @@ -50,11 +50,11 @@ Learning algorithm
| :green:`# gradient steps`
| **FOR** each gradient step up to :guilabel:`gradient_steps` **DO**
| :green:`# sample a batch from memory`
| [:math:`s, a, r, s', d`] :math:`\leftarrow` states, actions, rewards, next_states, dones of size :guilabel:`batch_size`
| [:math:`s, a, r, s', d_{_{end}}, d_{_{timeout}}`] with size :guilabel:`batch_size`
| :green:`# compute target values`
| :math:`a' \leftarrow \mu_{\theta_{target}}(s')`
| :math:`Q_{_{target}} \leftarrow Q_{\phi_{target}}(s', a')`
| :math:`y \leftarrow r \;+` :guilabel:`discount_factor` :math:`\neg d \; Q_{_{target}}`
| :math:`y \leftarrow r \;+` :guilabel:`discount_factor` :math:`\neg (d_{_{end}} \lor d_{_{timeout}}) \; Q_{_{target}}`
| :green:`# compute critic loss`
| :math:`Q \leftarrow Q_\phi(s, a)`
| :math:`L_{Q_\phi} \leftarrow \frac{1}{N} \sum_{i=1}^N (Q - y)^2`
Expand Down Expand Up @@ -236,6 +236,10 @@ Support for advanced features is described in the next table
- RNN, LSTM, GRU and any other variant
- .. centered:: :math:`\blacksquare`
- .. centered:: :math:`\square`
* - Mixed precision
- Automatic mixed precision
- .. centered:: :math:`\blacksquare`
- .. centered:: :math:`\square`
* - Distributed
- Single Program Multi Data (SPMD) multi-GPU
- .. centered:: :math:`\blacksquare`
Expand All @@ -256,16 +260,12 @@ API (PyTorch)
:private-members: _update
:members:

.. automethod:: __init__

.. autoclass:: skrl.agents.torch.ddpg.DDPG_RNN
:undoc-members:
:show-inheritance:
:private-members: _update
:members:

.. automethod:: __init__

.. raw:: html

<br>
Expand All @@ -280,5 +280,3 @@ API (JAX)
:show-inheritance:
:private-members: _update
:members:

.. automethod:: __init__
Loading
Loading