-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Hi author,
could you share your training config for qwen3-base series. One question is that we found in DAPO setting without KL penlaty, the qwen3-base model is easy to have model collapse after 200 training steps, I wonder whether you face the similar training instabilty for it when doing for off-policy RL. Thank you!
Metadata
Metadata
Assignees
Labels
No labels