Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sft training #17

Open
wangxichi opened this issue Dec 26, 2024 · 7 comments
Open

sft training #17

wangxichi opened this issue Dec 26, 2024 · 7 comments

Comments

@wangxichi
Copy link

使用sft微调训练llama模型,结果很差,是loss的原因吗?

@zhangdan0602
Copy link
Collaborator

请问详细的实验设置是什么?

@wangxichi
Copy link
Author

使用ReST-MCTS-Llama3-8b-Instruct-Policy-1st数据集,trl的SFT库微调,学习了设置了1e-4, 1e-5, 训练的迭代次数越多,模型的输出结果越差

@zhangdan0602
Copy link
Collaborator

学习率要小一些,2e-5;epoch是2;warmup_ratio是0.03

@wangxichi
Copy link
Author

还有别的需要注意的吗?trl库的SFT可以吗?数据集处理部分需要注意什么呢?
损失函数,我理解论文的是:会参考推理步骤之间的关联性,
但是正常的SFT,只会参考上下文之间的相关性啊,这个损失函数要修改吗?

@wangxichi
Copy link
Author

我甚至尝试了1e-7,可是模型输出的全是很多不相关的steps,
需要加入lora吗?

@1FirstWave1
Copy link

是分步骤一步一步微调的么(使用前i-1步作输入微调第i步)。代码本身没有给SFT的代码(未找到)而是DPO的代码,猜测可能这个地方引起了歧义

@zhangdan0602
Copy link
Collaborator

请参考:https://github.com/THUDM/ReST-MCTS#self-training

SFT SciGLM用的是:https://github.com/THUDM/SciGLM

SFT Llama和Mistral用的是:https://github.com/TIGER-AI-Lab/MAmmoTH

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants