-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Fixes Termination Manager logging to report aggregated percentage of environments done due to each term. #3107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6244bfa
to
45ff89d
Compare
@@ -182,7 +182,7 @@ def get_term(self, name: str) -> torch.Tensor: | |||
Returns: | |||
The corresponding termination term value. Shape is (num_envs,). | |||
""" | |||
return self._term_dones[name] | |||
return self._term_dones[name, self._term_names.index(name)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This indexing here is not great. We use the value from here to give termination rewards in different environments. Doing the .index
every time for reward computation may lead to slow downs (as it searches over the list in O(n) fashion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh if you like the single tensor approach I can also store one more name -> idx dict, and make change here O(1), I wasn't aware of termination rewards, (my bad!!) and thought this function may be used infrequently. I should do a global search next time rather than assume!
self._term_dones = dict() | ||
for term_name in self._term_names: | ||
self._term_dones[term_name] = torch.zeros(self.num_envs, device=self.device, dtype=torch.bool) | ||
self._term_dones = torch.zeros((self.num_envs, len(self._term_names)), device=self.device, dtype=torch.bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to change this as a dict of tensors to a single tensor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it last_episode_done_stats = self._term_dones.float().mean(dim=0)
this operation is very nice and optimized, thats why I did it, but if you think dict is more clear I can revert back : ))
# store information | ||
extras["Episode_Termination/" + key] = torch.count_nonzero(self._term_dones[key][env_ids]).item() | ||
extras["Episode_Termination/" + key] = last_episode_done_stats[i].item() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want the ratio, isn't that simply?
self._term_dones[key][env_ids].sum() / len(env_ids)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reveiwing!
I guess, doing with env_ids will be viewing ratio of resetting environments. I thought maybe report stats of all environment can be a bit more nicer as user can verify from the graph that all terms sum up to 1. Of course you can do self._term_dones[key].sum() / self.env.num_envs
as well
But it seems like if this approach is what I after, do it in one tensor operation seems quite nice, both speed wise and memory utility wise.
25d4594
to
c220593
Compare
Test Results Summary2 419 tests 2 011 ✅ 2h 22m 39s ⏱️ Results for commit a0767c6. ♻️ This comment has been updated with latest results. |
ea5fa9a
to
41ccce3
Compare
@Mayankm96 benchmarked task: velocity rough anymal c, 4096 envs, 1000 steps before change :
after change: one step total: 80.187 ms
the cost for this PR is about 0.04 ms / 80, 0.05%, mostly due to compute needs modify other terms's done to correctly update the done buffer. I think this is reasonable |
5ace435
to
a0767c6
Compare
Description
Currently Termination Manager write current step's done count for each term if reset is detected. This leads to two problem.
The cause of the bug is because we are reporting current step status into a buffer that suppose to record episodic done. So instead of write the entire buffer base on current value, we ask the update to respect the non-reseting environment's old value, and instead of reporting count, we report percentage of environment that was done due to the particular term.
Test on Isaac-Velocity-Rough-Anymal-C-v0
Before fix:

Red: num_envs = 4096, Orange: num_envs = 1024
After fix:
Note that curve of the same color ran on same seed, and curves matched exactly, the only difference is the data gets reported in termination. The percentage version is a lot more clear in conveying how agent currently fails, and how much percentage of agent fails, and shows that increasing num_envs to 4096 helps improve agent avoiding termination by
base_contact
much quicker than num_envs=1024. Such message is a bit hard to tell in first image.Checklist
pre-commit
checks with./isaaclab.sh --format
config/extension.toml
fileCONTRIBUTORS.md
or my name already exists there