-
Couldn't load subscription status.
- Fork 0
Open
Description
- Move & rename 6.25e-5 experiment now called/in dir "replace-me".
- both on lumi & ppi (moved to learning_rate/6.25e-5)
- re-initialize anemoi-env
- restart training run in new dir. using checkpoint from previous run.
- Check if output from other lr experiments ended up in wrong dirs.
- Print config info for all lr exp.
- 6.25e-2 exp. is in e-4 exp (?) EDIT: run_id 3109ee0eafad4c31bc095e2757315083 now placed correctly
- Start 50k runs for e-4 and e-5 experiments
- Run inference on the most promising runs (so far)
- Restarted run with lr e-4
- lr e-5 (in new dir, see above!)
- lr e-3 run with 20k
- 512 ch
- Number of channels:
- try lr e-5 for 512 and 1024 ch (related Training: nr channels experiments #195)
- understand why 1024 won't run without errors. (Why get PYTORCH_HIP_ALLOC_CONF warning?)
- Start no-zeta experiments with best lr and ch given above results (see Training: redo no-zeta experiment #179)
- Start graph experiments
- Boundary size experiments
Metadata
Metadata
Assignees
Labels
No labels