Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

everything is normal in the beginning until answer after <think> suddenly all become ! #63

Open
momo4826 opened this issue Feb 12, 2025 · 6 comments

Comments

@momo4826
Copy link

Image

Anyone knows why? My training setting is as below:

base_model: qwen2.5-3B

GPU: 8*A800

python -m verl.trainer.main_ppo \ algorithm.adv_estimator=grpo \ data.train_files=$DATA_DIR/train.parquet \ data.val_files=$DATA_DIR/test.parquet \ data.train_batch_size=64 \ data.val_batch_size=640 \ data.max_prompt_length=256 \ data.max_response_length=1024 \ actor_rollout_ref.model.path=$BASE_MODEL \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ppo_mini_batch_size=256 \ actor_rollout_ref.actor.ppo_micro_batch_size=128 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.001 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.model.enable_gradient_checkpointing=True \ actor_rollout_ref.actor.fsdp_config.param_offload=False \ actor_rollout_ref.actor.fsdp_config.grad_offload=False \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \ actor_rollout_ref.rollout.log_prob_micro_batch_size=256 \ actor_rollout_ref.rollout.tensor_model_parallel_size=$ROLLOUT_TP_SIZE \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.n=5 \ actor_rollout_ref.ref.log_prob_micro_batch_size=256 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.kl_ctrl.kl_coef=0.001 \ trainer.critic_warmup=0 \ trainer.logger=['wandb'] \ +trainer.val_before_train=False \ trainer.default_hdfs_dir=null \ trainer.n_gpus_per_node=$N_GPUS \ trainer.nnodes=1 \ trainer.save_freq=100 \ trainer.test_freq=100 \ trainer.project_name=TinyZero \ trainer.experiment_name=$EXPERIMENT_NAME \ trainer.total_epochs=10 2>&1 | tee verl_demo.log

@AstonyJ
Copy link

AstonyJ commented Feb 13, 2025

I encountered the same issue, and the training curve is shown in the image below. Also, are you able to train normally with 8 GPUs? I can only train with 2 GPUs, but it doesn’t work with 4 or 8 GPUs.

Image

@momo4826
Copy link
Author

I encountered the same issue, and the training curve is shown in the image below. Also, are you able to train normally with 8 GPUs? I can only train with 2 GPUs, but it doesn’t work with 4 or 8 GPUs.

Image

Yes, I trained the model using 8*A800 GPUs.

I guess that the abnormal results may be due to the countdown task being challenging for the 3B model.

I found another project where the approach was to first train on an easier dataset and then on a more difficult one, adjusting the parameter settings in the process.

@AI-Santiago
Copy link

I encountered the same issue, and the training curve is shown in the image below. Also, are you able to train normally with 8 GPUs? I can only train with 2 GPUs, but it doesn’t work with 4 or 8 GPUs.
Image

Yes, I trained the model using 8*A800 GPUs.

I guess that the abnormal results may be due to the countdown task being challenging for the 3B model.

I found another project where the approach was to first train on an easier dataset and then on a more difficult one, adjusting the parameter settings in the process.

Hi. Is the 'another project' on github? If yes, could you please you share its name?

@Molri19
Copy link

Molri19 commented Feb 14, 2025

Has your problem been solved? I've encountered the same problem.

Image

@momo4826
Copy link
Author

Has your problem been solved? I've encountered the same problem.

Image

exactly the same problem

I encountered the same issue, and the training curve is shown in the image below. Also, are you able to train normally with 8 GPUs? I can only train with 2 GPUs, but it doesn’t work with 4 or 8 GPUs.
Image

Yes, I trained the model using 8*A800 GPUs.
I guess that the abnormal results may be due to the countdown task being challenging for the 3B model.
I found another project where the approach was to first train on an easier dataset and then on a more difficult one, adjusting the parameter settings in the process.

Hi. Is the 'another project' on github? If yes, could you please you share its name?

https://github.com/Unakar/Logic-RL/tree/main

@My-laniaKeA
Copy link

My-laniaKeA commented Feb 17, 2025

When using the Qwen2.5-7B-Instruct model as the base model with the default dataset, I encountered the same issue.
Changing

python ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset}

to

python ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset} --template_type qwen-instruct

resolves the issue for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants