Skip to content

Commit e77abb9

Browse files
nicolovfacebook-github-bot
authored andcommitted
Normalize reward-to-go in C++ actor-critic (pytorch#33550)
Summary: Comparing to the [Python implementation](https://github.com/pytorch/examples/blob/master/reinforcement_learning/actor_critic.py), it seems like the tensor of normalized reward-to-go is computed but never used. Even if it's just an integration test, this PR switches to the normalized version for better convergence. Pull Request resolved: pytorch#33550 Differential Revision: D20024393 Pulled By: yf225 fbshipit-source-id: ebcf0fee14ff39f65f6744278fb0cbf1fc92b919
1 parent ee28831 commit e77abb9

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

test/cpp/api/integration.cpp

+3-3
Original file line numberDiff line numberDiff line change
@@ -193,10 +193,10 @@ TEST_F(IntegrationTest, CartPole) {
193193
std::vector<torch::Tensor> policy_loss;
194194
std::vector<torch::Tensor> value_loss;
195195
for (auto i = 0U; i < saved_log_probs.size(); i++) {
196-
auto r = rewards[i] - saved_values[i].item<float>();
197-
policy_loss.push_back(-r * saved_log_probs[i]);
196+
auto advantage = r_t[i] - saved_values[i].item<float>();
197+
policy_loss.push_back(-advantage * saved_log_probs[i]);
198198
value_loss.push_back(
199-
torch::smooth_l1_loss(saved_values[i], torch::ones(1) * rewards[i]));
199+
torch::smooth_l1_loss(saved_values[i], torch::ones(1) * r_t[i]));
200200
}
201201

202202
auto loss =

0 commit comments

Comments
 (0)