-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running demo code on V100 #8
Comments
@zhang-tao-whu Check the issues. It seems like you need to modify the code to remove the flash attention. |
@lxtGH @zhang-tao-whu Thanks a lot. Do you have some hints how to remove flash attention? Can it be done by "passing-a-parameter-kind-of-thing"? Been looking around and trying out but it still does not work. |
Maybe you can try:
Please let me know whether it works well on V100. |
Hi I have also tried to set use_flash_attn=False; but it seems not working for V100? as there is still error message saying the modeling file is requiring flash_attn |
I see. Let me check it. |
Yes, I did the same as per @HarborYuan comment and got the same error as @Ruining0916 described above. |
Hi,
Congrats on the great work!
I only have V100 GPUs available to me.
Is there a way to run your inference/demo code and how (e.g. with no flash attention)?
Many thanks in advance!
The text was updated successfully, but these errors were encountered: