Skip to content

Fixes Termination Manager logging to report aggregated percentage of environments done due to each term. #3107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 14, 2025

Conversation

ooctipus
Copy link
Collaborator

@ooctipus ooctipus commented Aug 7, 2025

Description

Currently Termination Manager write current step's done count for each term if reset is detected. This leads to two problem.

  1. User sees different counts just by varying num_envs
  2. the count can be over-count or under-count depending on when reset happens, as pointed out in [Bug Report] Termination Overcounting Caused by Missing Log Buffer Reset in manager_based_rl_env.py #2977 (Thanks, @Kyu3224)

The cause of the bug is because we are reporting current step status into a buffer that suppose to record episodic done. So instead of write the entire buffer base on current value, we ask the update to respect the non-reseting environment's old value, and instead of reporting count, we report percentage of environment that was done due to the particular term.

Test on Isaac-Velocity-Rough-Anymal-C-v0

Before fix:
Screenshot from 2025-08-06 22-16-20
Red: num_envs = 4096, Orange: num_envs = 1024

After fix:

Screenshot from 2025-08-06 22-16-12 Red: num_envs = 4096, Orange: num_envs = 1024

Note that curve of the same color ran on same seed, and curves matched exactly, the only difference is the data gets reported in termination. The percentage version is a lot more clear in conveying how agent currently fails, and how much percentage of agent fails, and shows that increasing num_envs to 4096 helps improve agent avoiding termination by base_contact much quicker than num_envs=1024. Such message is a bit hard to tell in first image.

Checklist

  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@ooctipus ooctipus force-pushed the fix/termination_reporting branch from 6244bfa to 45ff89d Compare August 7, 2025 07:47
@ooctipus ooctipus changed the title Fixed Termination Manager logging to report aggregated percentage of environments done due to each term. Fixes Termination Manager logging to report aggregated percentage of environments done due to each term. Aug 7, 2025
@@ -182,7 +182,7 @@ def get_term(self, name: str) -> torch.Tensor:
Returns:
The corresponding termination term value. Shape is (num_envs,).
"""
return self._term_dones[name]
return self._term_dones[name, self._term_names.index(name)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indexing here is not great. We use the value from here to give termination rewards in different environments. Doing the .index every time for reward computation may lead to slow downs (as it searches over the list in O(n) fashion).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh if you like the single tensor approach I can also store one more name -> idx dict, and make change here O(1), I wasn't aware of termination rewards, (my bad!!) and thought this function may be used infrequently. I should do a global search next time rather than assume!

self._term_dones = dict()
for term_name in self._term_names:
self._term_dones[term_name] = torch.zeros(self.num_envs, device=self.device, dtype=torch.bool)
self._term_dones = torch.zeros((self.num_envs, len(self._term_names)), device=self.device, dtype=torch.bool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to change this as a dict of tensors to a single tensor?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it last_episode_done_stats = self._term_dones.float().mean(dim=0) this operation is very nice and optimized, thats why I did it, but if you think dict is more clear I can revert back : ))

# store information
extras["Episode_Termination/" + key] = torch.count_nonzero(self._term_dones[key][env_ids]).item()
extras["Episode_Termination/" + key] = last_episode_done_stats[i].item()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want the ratio, isn't that simply?

self._term_dones[key][env_ids].sum() / len(env_ids)

Copy link
Collaborator Author

@ooctipus ooctipus Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reveiwing!
I guess, doing with env_ids will be viewing ratio of resetting environments. I thought maybe report stats of all environment can be a bit more nicer as user can verify from the graph that all terms sum up to 1. Of course you can do self._term_dones[key].sum() / self.env.num_envs as well

But it seems like if this approach is what I after, do it in one tensor operation seems quite nice, both speed wise and memory utility wise.

@ooctipus ooctipus force-pushed the fix/termination_reporting branch from 25d4594 to c220593 Compare August 10, 2025 18:28
Copy link

github-actions bot commented Aug 10, 2025

Test Results Summary

2 419 tests   2 011 ✅  2h 22m 39s ⏱️
   90 suites    408 💤
    1 files        0 ❌

Results for commit a0767c6.

♻️ This comment has been updated with latest results.

@ooctipus ooctipus force-pushed the fix/termination_reporting branch 2 times, most recently from ea5fa9a to 41ccce3 Compare August 11, 2025 08:55
@ooctipus
Copy link
Collaborator Author

ooctipus commented Aug 13, 2025

@Mayankm96
I did my best to optimize the operations in compute and reset,
index all changed from O(n) to O(1) -> this part not captured in benchmark below

benchmarked task: velocity rough anymal c, 4096 envs, 1000 steps

before change :
one step total: 79.038

| termination.compute     |       0.229 ms|
| termination.reset       |       0.007 ms|

after change:

one step total: 80.187 ms

| termination.compute     |       0.274 ms|  slower due to one extra operation
| termination.reset       |       0.004 ms|  twice as fast

the cost for this PR is about 0.04 ms / 80, 0.05%, mostly due to compute needs modify other terms's done to correctly update the done buffer. I think this is reasonable

@ooctipus ooctipus force-pushed the fix/termination_reporting branch from 5ace435 to a0767c6 Compare August 13, 2025 20:55
@ooctipus ooctipus merged commit 8dabd3f into main Aug 14, 2025
10 checks passed
@ooctipus ooctipus deleted the fix/termination_reporting branch August 14, 2025 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants