-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sft training #17
Comments
请问详细的实验设置是什么? |
使用ReST-MCTS-Llama3-8b-Instruct-Policy-1st数据集,trl的SFT库微调,学习了设置了1e-4, 1e-5, 训练的迭代次数越多,模型的输出结果越差 |
学习率要小一些,2e-5;epoch是2;warmup_ratio是0.03 |
还有别的需要注意的吗?trl库的SFT可以吗?数据集处理部分需要注意什么呢? |
我甚至尝试了1e-7,可是模型输出的全是很多不相关的steps, |
是分步骤一步一步微调的么(使用前i-1步作输入微调第i步)。代码本身没有给SFT的代码(未找到)而是DPO的代码,猜测可能这个地方引起了歧义 |
请参考:https://github.com/THUDM/ReST-MCTS#self-training SFT SciGLM用的是:https://github.com/THUDM/SciGLM SFT Llama和Mistral用的是:https://github.com/TIGER-AI-Lab/MAmmoTH |
使用sft微调训练llama模型,结果很差,是loss的原因吗?
The text was updated successfully, but these errors were encountered: