-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to reproduce QAT results from Blog #2310
Comments
Hi, |
Hi @AbhinavDutta thanks for opening the issue. Sorry, somehow this one slipped through the cracks. Let me take a look now |
@AbhinavDutta I haven't forgotten about this.. still debugging. I confirmed that the loss curves for Llama3 8B QAT have not changed since the recipe first landed. I am also gonna roll back torchao versions and run with Llama2 to figure out if/when there was a regression. Will keep you posted |
@ebsmothers much appreciated! |
@ebsmothers I faced the same issue using torchtune version 0.2.1, torch version 2.4.1, torchao version 0.3.1 Looking at the dates, this seems closest to the dates of the blog. But I'm facing similar issues there as well. However, on using the alpaca dataset, I observed that the loss curve does look like normal (but that could be because Llama 3 8B was not instruction tuned to begin with ? UPDATE- it's not, I observe the same loss curve even when using Llama 3 8B-Instruct). I only did that because the QAT config files used this dataset by default. However, looking at the blog details it looks like c4 was used there for QAT and I'm guessing that using the following must have been appropriate in that case :-
So, can you confirm if you ever observed decent loss curves when training with torchtune.datasets.text_completion_dataset ? If so, can you tell me the steps to reproduce? |
@AbhinavDutta we are still looking into this. I have run QAT on Llama2 and Llama3 all the way back to when the QAT PR first landed and I see the same loss curves (i.e. there has not been a regression since the original PR landed). I've confirmed with @andrewor14 that the original results are based on the instruct model, not the base model, and your config looks correct to me. Regarding the loss curves, I will let Andrew weigh in, as he conducted most of the experiments and so will have the most useful information here. |
@ebsmothers just to make sure I understand what's going on, could you clarify on the following regarding the loss curves you are talking about (the ones that did not regress) - (i) does it decrease and then flatten as usual ? (ii) was the base model used or instruct ? (iii) was c4 used or alpaca ? |
Hi @AbhinavDutta, by the way have you tried just running the exact same workload without QAT? A few months ago I found that finetuning Llama3-8B on C4 (even without QAT) doesn't seem to converge anymore: #1526. To answer your questions: |
@andrewor14 thanks for letting me know! I'll see if the instruct version works out. I'm guessing without QAT the training won't go anywhere as well (the config I used did not do any quantization for the first 1000 steps and nothing good happened in that regime either) |
I've been trying to reproduce the Llama3-8b QAT numbers from the blog but have been unable to do so. The training curve looks pretty bad (indicating no training is happening at all) and the evluations are off as well (wandb logs)). Can you let me know if I'm missing something in the configs ?
These are the evals I get :-
The text was updated successfully, but these errors were encountered: