While running examples/training_examples/jax_rl_mimic/experiment.py, I noticed a small issue in the WandB sweep metric logging:
# metric for used for wandb sweep (optional)
site_rpos = validation_metrics.euclidean_distance.site_rpos[i]
site_rrotvec = validation_metrics.euclidean_distance.site_rpos[i]
site_rvel = validation_metrics.euclidean_distance.site_rpos[i]
run.log({"Metric for Sweep": site_rpos + site_rrotvec + site_rvel},
step=int(training_metrics.max_timestep[i]))
It looks like site_rrotvec and site_rvel are mistakenly using site_rpos instead of their own values. This doesn’t affect the training itself, but it makes the logged sweep metric less informative.
Would it be possible to fix this in a future update?
Thanks!