Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding MPO and DMPO #392

Open
wants to merge 27 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
a645493
Adding MPO with TD(n) critic target and MPO with TD(n) critic target …
Jogima-cyber May 23, 2023
2bd6b69
Solving pre-commit issues for mpo_tdn_continuous_action.py
Jogima-cyber May 23, 2023
ec7acea
Solving pre-commit issues for mpo_tdn_continuous_action.py
Jogima-cyber May 23, 2023
5ebfa67
Solving pre-commit issues for mpo_tdn_continuous_action.py and dmpo_c…
Jogima-cyber May 23, 2023
768b1e3
Solving pre-commit issues for mpo_tdn_continuous_action.py and dmpo_c…
Jogima-cyber May 23, 2023
3eb94eb
Solving pre-commit issues for mpo_tdn_continuous_action.py and dmpo_c…
Jogima-cyber May 23, 2023
0e9a1a3
Adding a missed implementation detail to mpo_tdn_continuous_action.py…
Jogima-cyber May 24, 2023
17648ad
Soving some pre-commit issues
Jogima-cyber May 24, 2023
de57632
Solving minor bug.
Jogima-cyber May 24, 2023
5c6cffe
Minor bugs.
Jogima-cyber May 24, 2023
8464ac4
Minor bugs.
Jogima-cyber May 24, 2023
19a8c5b
Other implementation detail missed: 1. critic loss is scaled with 0.5…
Jogima-cyber May 24, 2023
42d6937
Bug and pre-commit issue
Jogima-cyber May 24, 2023
8a000f4
Adding several implementation detail.
Jogima-cyber May 26, 2023
f5b493d
Solving pre-commit issues
Jogima-cyber May 26, 2023
17f60a8
Solving pre-commit issues
Jogima-cyber May 26, 2023
41a0ec0
Solving new differences and repairing DMPO.
Jogima-cyber May 27, 2023
1ff0211
Solving pre-commit issues.
Jogima-cyber May 27, 2023
463e833
Solving pre-commit issues.
Jogima-cyber May 27, 2023
cc6a340
Small changes.
Jogima-cyber May 29, 2023
235b9c4
Pre-commit issues
Jogima-cyber May 29, 2023
ae1bc34
Adding environment interaction network.
Jogima-cyber May 29, 2023
b1c9fb3
Env actor update should be with target actor, not online actor
Jogima-cyber May 29, 2023
f4483ac
Downgrading gym so that it matches with the gym version used in cleanRL
Jogima-cyber Jun 19, 2023
b5529a4
Solving bugs related to latest commit
Jogima-cyber Jun 19, 2023
233fabe
Solving bugs related to latest commit
Jogima-cyber Jun 19, 2023
6dbc81f
Solving bugs related to latest commit
Jogima-cyber Jun 19, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Solving bugs related to latest commit
Jogima-cyber committed Jun 19, 2023
commit b5529a4f7e82342b2e8cde7a6cb916a60b40508c
4 changes: 2 additions & 2 deletions cleanrl/dmpo_continuous_action.py
Original file line number Diff line number Diff line change
@@ -365,7 +365,7 @@ def _get_samples(self, batch_inds: np.ndarray, env: Optional[VecNormalize] = Non
critic_optimizer = torch.optim.Adam(qf.parameters(), lr=args.policy_q_lr)
dual_optimizer = torch.optim.Adam([log_eta, log_alpha_mean, log_alpha_stddev, log_penalty_temperature], lr=args.dual_lr)

obs, _ = envs.reset(seed=args.seed)
obs = envs.reset(seed=args.seed)

n_step_obs_rolling_buffer = np.zeros((args.n_step,) + envs.single_observation_space.shape)
n_step_action_rolling_buffer = np.zeros((args.n_step,) + envs.single_action_space.shape)
@@ -690,7 +690,7 @@ def _get_samples(self, batch_inds: np.ndarray, env: Optional[VecNormalize] = Non

eval_every = 5000
eval_nb = 10
eval_obs, _ = eval_envs.reset(seed=args.seed)
eval_obs = eval_envs.reset(seed=args.seed)
eval_episodic_return = np.zeros((eval_nb,))
eval_episodic_length = np.zeros((eval_nb,))
if (global_step + 1) % eval_every == 0: