-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
everything is normal in the beginning until answer after <think> suddenly all become ! #63
Comments
Yes, I trained the model using 8*A800 GPUs. I guess that the abnormal results may be due to the countdown task being challenging for the 3B model. I found another project where the approach was to first train on an easier dataset and then on a more difficult one, adjusting the parameter settings in the process. |
When using the Qwen2.5-7B-Instruct model as the base model with the default dataset, I encountered the same issue. python ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset} to python ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset} --template_type qwen-instruct resolves the issue for me. |
Anyone knows why? My training setting is as below:
base_model: qwen2.5-3B
GPU: 8*A800
python -m verl.trainer.main_ppo \ algorithm.adv_estimator=grpo \ data.train_files=$DATA_DIR/train.parquet \ data.val_files=$DATA_DIR/test.parquet \ data.train_batch_size=64 \ data.val_batch_size=640 \ data.max_prompt_length=256 \ data.max_response_length=1024 \ actor_rollout_ref.model.path=$BASE_MODEL \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ppo_mini_batch_size=256 \ actor_rollout_ref.actor.ppo_micro_batch_size=128 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.001 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.model.enable_gradient_checkpointing=True \ actor_rollout_ref.actor.fsdp_config.param_offload=False \ actor_rollout_ref.actor.fsdp_config.grad_offload=False \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \ actor_rollout_ref.rollout.log_prob_micro_batch_size=256 \ actor_rollout_ref.rollout.tensor_model_parallel_size=$ROLLOUT_TP_SIZE \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.n=5 \ actor_rollout_ref.ref.log_prob_micro_batch_size=256 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.kl_ctrl.kl_coef=0.001 \ trainer.critic_warmup=0 \ trainer.logger=['wandb'] \ +trainer.val_before_train=False \ trainer.default_hdfs_dir=null \ trainer.n_gpus_per_node=$N_GPUS \ trainer.nnodes=1 \ trainer.save_freq=100 \ trainer.test_freq=100 \ trainer.project_name=TinyZero \ trainer.experiment_name=$EXPERIMENT_NAME \ trainer.total_epochs=10 2>&1 | tee verl_demo.log
The text was updated successfully, but these errors were encountered: