Feat Sebulba recurrent IQL #1148

Louay-Ben-nessir · 2024-12-04T14:01:10Z

What?

A recurrent IQL implementation using the Sebulba architecture.

Why?

Offline Sebulba base and non-jax envs in Mava.

How?

Mixed the Sebulba structure from PPO with the learner code from Anakin IQL.

…er acotr

sash-a

I've looked through everything except the system file and it looks good, Sebulba utils especially! Just some relatively minor style changes

mava/configs/system/q_learning/rec_iql.yaml

mava/systems/q_learning/types.py

mava/utils/config.py

sash-a · 2025-01-07T10:55:57Z

mava/utils/sebulba.py


-# todo: remove the ppo dependencies when we make sebulba for other systems


This is a good point though, maybe there's something we can do about it 🤔

Maybe a protocol like that has action, obs, reward, not sure if there's any other common attributes?

mava/utils/sebulba.py

sash-a · 2025-01-07T11:37:45Z

mava/wrappers/gym.py

+        terminated = np.repeat(
+            terminated[..., np.newaxis], repeats=self.num_agents, axis=-1
+        )  # (B,) --> (B, N)


Does this already happen for smax and lbf?

sash-a

Great work here! Really minor changes required. Happy to merge this pending some benchmarks

mava/systems/q_learning/sebulba/rec_iql.py

sash-a · 2025-01-08T13:00:19Z

mava/systems/q_learning/sebulba/rec_iql.py

+                target: Array,
+            ) -> Tuple[Array, Metrics]:
+                # axes switched here to scan over time
+                hidden_state, obs_term_or_trunc = prep_inputs_to_scannedrnn(obs, term_or_trunc)


A general comment, I think this would be a lot easier to read if we used done to mean term_or_trunc which I think is a reasonable thing. Would have to make the change in anakin also though :/

mava/systems/q_learning/sebulba/rec_iql.py

sash-a · 2025-01-08T13:39:32Z

mava/systems/q_learning/sebulba/rec_iql.py

+        """
+
+        eps = jnp.maximum(
+            config.system.eps_min, 1 - (t / config.system.eps_decay) * (1 - config.system.eps_min)


Would be nice if we could set a different decay per actor, although I think that's out of scope for this PR. Maybe if you could make an issue to add in some of the ape-X DQN features that would be great

I can easily add this in this PR 👀

I think rather leave it for now, no need to make this more complex

mava/systems/q_learning/sebulba/rec_iql.py

mava/configs/system/q_learning/rec_iql.yaml

mava/systems/q_learning/sebulba/rec_iql.py

…r chanage

SimonDuToit

Great work Louay! Just two questions from my side as you can see in the comments.

SimonDuToit · 2025-02-06T13:26:26Z

mava/wrappers/gym.py

-        rewards = np.zeros((num_envs, num_agents), dtype=float)
-        teminated = np.zeros(num_envs, dtype=float)
+        rewards = np.zeros((num_envs, self.num_agents), dtype=float)
+        terminated = np.zeros((num_envs, self.num_agents), dtype=float)


I assume with this change we would also need to change sebulba PPO? Since currently it does this same operation. We should decide if generally its better doing this in the system or the wrapper.

Good point, I think we do this in the wrappers for the anakin systems

mava/utils/sebulba/pipelines.py

sash-a

Looks great to me, just a few minor things 🙏

mava/utils/sebulba/pipelines.py

mava/systems/q_learning/sebulba/rec_iql.py

sash-a

🔥

Louay-Ben-nessir added 4 commits November 18, 2024 09:47

feat: inital iql

23f5d0c

fix: concat of trajs from diffrent actors

ee4834f

fix: deadlock caused by deleting when buffer is full

7e44d15

fix: major changes to the ratelimiter configs and a separate buffer p…

6c8452f

…er acotr

pull-request-size bot added the size/XXL label Dec 4, 2024

Louay-Ben-nessir self-assigned this Dec 4, 2024

Louay-Ben-nessir and others added 2 commits January 4, 2025 20:13

docs: minor comment chnage

834b528

Merge branch 'develop' into feat-sebulba-rec-iql

6ab5197

Louay-Ben-nessir marked this pull request as ready for review January 4, 2025 19:57

Louay-Ben-nessir requested review from RuanJohn, sash-a, OmaymaMahjoub, WiemKhlifi and SimonDuToit as code owners January 4, 2025 19:57

Louay-Ben-nessir mentioned this pull request Jan 4, 2025

Feat: c envs support #1152

Draft

sash-a requested changes Jan 7, 2025

View reviewed changes

Merge branch 'develop' into feat-sebulba-rec-iql

312c280

sash-a requested changes Jan 8, 2025

View reviewed changes

sash-a reviewed Jan 15, 2025

View reviewed changes

mava/systems/q_learning/sebulba/rec_iql.py Outdated Show resolved Hide resolved

Louay-Ben-nessir added 4 commits January 22, 2025 12:15

feat: changed file structer, removed the threadlifetime and made mino…

93b51b0

…r chanage

chore: pre-commit

65aecb6

chore: minor changes

e16e64d

feat: Blocking rate limiter & replay ratio param

ccd5957

SimonDuToit reviewed Feb 6, 2025

View reviewed changes

sash-a requested changes Feb 10, 2025

View reviewed changes

Louay-Ben-nessir added 2 commits February 10, 2025 14:48

chor: small changes

06019db

feat: commen shutdown function

1f764f4

sash-a approved these changes Feb 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat Sebulba recurrent IQL #1148

Feat Sebulba recurrent IQL #1148

Louay-Ben-nessir commented Dec 4, 2024

sash-a left a comment

sash-a Jan 7, 2025

sash-a Jan 7, 2025

sash-a left a comment

sash-a Jan 8, 2025

sash-a Jan 8, 2025

Louay-Ben-nessir Feb 10, 2025

sash-a Feb 13, 2025

SimonDuToit left a comment

SimonDuToit Feb 6, 2025

sash-a Feb 10, 2025

sash-a left a comment

sash-a left a comment


		# todo: remove the ppo dependencies when we make sebulba for other systems

Feat Sebulba recurrent IQL #1148

Are you sure you want to change the base?

Feat Sebulba recurrent IQL #1148

Conversation

Louay-Ben-nessir commented Dec 4, 2024

What?

Why?

How?

sash-a left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sash-a left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SimonDuToit left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sash-a left a comment

Choose a reason for hiding this comment

sash-a left a comment

Choose a reason for hiding this comment