Warn if using offline data buffer without Agent wrapper #1771

QuantuMope · 2025-05-12T21:14:26Z

This PR now warns users if they are using the offline data buffer without the Agent wrapper.

hnyu · 2025-05-12T21:18:02Z

alf/algorithms/sac_algorithm.py

                           action, action_distribution):
+
+        if isinstance(rollout_info, BasicRolloutInfo):
+            rollout_info = rollout_info.rl


This should be put outside of this function. The general principle is, the algorithm should always receive what it's supposed to receive. In this case, this means that the rollout_info passed in should already be SacInfo.

This should be put outside of this function. The general principle is, the algorithm should always receive what it's supposed to receive. In this case, this means that the rollout_info passed in should already be SacInfo.

+1

Thanks. Fixed.

hnyu · 2025-05-12T21:57:13Z

alf/algorithms/sac_algorithm.py

-        if isinstance(rollout_info, BasicRolloutInfo):
-            rollout_info = rollout_info.rl
+                           state: SacCriticState,
+                           rollout_info: SacInfo | BasicRLInfo, action,


Still should always be SacInfo? If it's BasicRLInfo, the algorithm will crash.

Offline buffer data is stored as BasicRLInfo which comprises of just (s,a,r) data.

I guess, I could convert BasicRLInfo into SacInfo with some fields empty? Not sure which is a better design. Lmk which one you think is cleaner and I can change.

Offline buffer data is stored as BasicRLInfo which comprises of just (s,a,r) data.

If you look at SAC's train_step(), it will get access to rollout_info.repr. This means that SAC is currently incompatible with offline training.

I'm using a frozen encoder so not training a repr.

Wait, it seems that repr is stored in BasicRolloutInfo. Not sure how this code was running then. I'll take a look

It's using elastic_namedtuple so any missing field returns (). Anyway, a little weird but it works.

Ideally, we should not include BasicRLInfo here as it could confuse the pure sac users. The better alternative might be comply with the Agent assumption and possibly extend it.

Removed typehints. Also added a warning message advising users to use Agent. Before the code would simply crash due to interface conflict.

hnyu · 2025-05-13T21:20:19Z

alf/algorithms/algorithm.py

+                    logging.WARNING,
+                    "Detected offline buffer training without Agent wrapper. "
+                    "For best compatibility, it is advised to use the Agent wrapper.",
+                    n=1)


This warning won't work. When using Agent, we still get rollout_info as BasicRolloutInfo.

It will. When using Agent, it properly feeds the nested BasicRLInfo to this function instead.

In other words, there was never a bug with hybrid RL training, just that it was never meant to be used without using Agent.

It will. When using Agent, it properly feeds the nested BasicRLInfo to this function instead.

you're right. I didn't know Agent overwrites this function itself.

[Bug Fix] Enable hybrid SAC training

74240c4

QuantuMope requested a review from emailweixu May 12, 2025 21:14

hnyu reviewed May 12, 2025

View reviewed changes

address comments

2c7e7cf

hnyu reviewed May 12, 2025

View reviewed changes

hnyu previously approved these changes May 12, 2025

View reviewed changes

address comments

9db5652

QuantuMope dismissed hnyu’s stale review via 9db5652 May 13, 2025 20:48

QuantuMope changed the title ~~[Bug Fix] Enable hybrid SAC training~~ Warn if using offline data buffer without Agent wrapper May 13, 2025

hnyu reviewed May 13, 2025

View reviewed changes

Warn if using offline data buffer without Agent wrapper #1771

Are you sure you want to change the base?

Warn if using offline data buffer without Agent wrapper #1771

Uh oh!

Conversation

QuantuMope commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hnyu May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hnyu May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

QuantuMope May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

QuantuMope commented May 12, 2025 •

edited

Loading

hnyu May 12, 2025 •

edited

Loading

hnyu May 12, 2025 •

edited

Loading

QuantuMope May 13, 2025 •

edited

Loading