You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to use flashattention with replace_with_xformers_attention(). but with recent transformers, i believe LLaMA can direct use flashattention by specificing atten_implementation when loading the pretrained model. this line is not necessary any more.
Hi @MXueguang ,
I wonder what's the purpose of having replace_with_xformers_attention() defined in the utils.py because I am getting the following error,
Does the
self.num_key_value_heads
value in the replace_with_xformers_attention() defined somewhere else ?The text was updated successfully, but these errors were encountered: