You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Normalize reward-to-go in C++ actor-critic (pytorch#33550)
Summary:
Comparing to the [Python implementation](https://github.com/pytorch/examples/blob/master/reinforcement_learning/actor_critic.py), it seems like the tensor of normalized reward-to-go is computed but never used. Even if it's just an integration test, this PR switches to the normalized version for better convergence.
Pull Request resolved: pytorch#33550
Differential Revision: D20024393
Pulled By: yf225
fbshipit-source-id: ebcf0fee14ff39f65f6744278fb0cbf1fc92b919
0 commit comments