Could you release the weights of PRM? #4

cybisolated · 2024-09-30T18:02:37Z

Thanks for your contribution! Could you release the weights of PRM? Or maybe there is something I omit?

zhangdan0602 · 2024-10-15T14:06:24Z

We have illustrated how to train PRM. Specifically, you can download [$D_{V_0}$] and put them in PRM/data to train Mistral-7B as the initial process reward model and obtain VALUE_MODEL_STATE_DICT.
We also provide PRM/train_VM_chatglm.py and PRM/train_VM_mistral.py.

ImKeTT · 2024-11-14T23:48:45Z

Great work, but is it possible to just release the model weight?

jingjingchengcai · 2024-12-14T17:27:13Z

I trained the PRM/train_VM_chatglm.py by following the instructions with 2 epochs. The accuracy I got is 0.1614. Is this expected? How many epochs should we use?

jingjingchengcai · 2024-12-16T03:55:03Z

I also trained PRM/train_VM_mistral.py and the accuracy is 0.1530 after two epochs.
I found one epoch actually gives better accuracy, i.e., 0.2247.
Without training, the accuracy is about 0.1182.
Did anyone get similar results?

jingjingchengcai · 2024-12-16T19:10:51Z

Thank you for your contributions! I’m currently stuck with training the VM.

Below are the statistics of the training data, showing each label and its corresponding number of samples:

Counter({'0.0': 240594, '1.0': 48953, '0.1': 20901, '0.8': 20614, '0.5': 18341, '0.2': 17034, '0.3': 16688, '0.7': 14310, '0.6': 13104, '0.4': 9448, '0.9': 6462})

If the RM predicts only label 0.0 regardless of the input text, the accuracy would be 240,594/426,449 = 0.56. Surprisingly, this is significantly higher than the accuracy achieved by fine-tuning the RM. I hope the author can help me identify what I might be doing wrong.

zhangdan0602 · 2024-12-25T08:11:56Z

The experimental settings are as follows:

For ChatGLM3-6B, learning rate (lr) is 2e-5, the number of epochs is 2 or 3, and batch size is 3.

For Mistral, learning rate (lr) is 3e-6, the number of epochs is 2 or 3, and batch size is 3.

debajoycs98 · 2025-01-24T22:16:41Z

Can anyone let me know approximately how much time it required to run the 2 epochs for mistral on a A100. It shows me around 35 hrs!!!

zhangdan0602 added about dataset datasets of PRM and policy model about PRM labels Dec 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could you release the weights of PRM? #4

Could you release the weights of PRM? #4

cybisolated commented Sep 30, 2024

zhangdan0602 commented Oct 15, 2024

ImKeTT commented Nov 14, 2024

jingjingchengcai commented Dec 14, 2024 •

edited

Loading

jingjingchengcai commented Dec 16, 2024

jingjingchengcai commented Dec 16, 2024

zhangdan0602 commented Dec 25, 2024

debajoycs98 commented Jan 24, 2025

Could you release the weights of PRM? #4

Could you release the weights of PRM? #4

Comments

cybisolated commented Sep 30, 2024

zhangdan0602 commented Oct 15, 2024

ImKeTT commented Nov 14, 2024

jingjingchengcai commented Dec 14, 2024 • edited Loading

jingjingchengcai commented Dec 16, 2024

jingjingchengcai commented Dec 16, 2024

zhangdan0602 commented Dec 25, 2024

debajoycs98 commented Jan 24, 2025

jingjingchengcai commented Dec 14, 2024 •

edited

Loading