Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keep getting error regarding missing positional argument 'attention_mask' #690

Open
BBC-Esq opened this issue Jan 14, 2025 · 8 comments
Open

Comments

@BBC-Esq
Copy link

BBC-Esq commented Jan 14, 2025

I've tried various troubleshooting steps but can't seem to resolve the following error:

Traceback (most recent call last):
  File "D:\Scripts\bench_chat\convert_awq_gui2.py", line 105, in run
    model.quantize(
  File "D:\Scripts\bench_chat\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "D:\Scripts\bench_chat\Lib\site-packages\awq\models\base.py", line 239, in quantize
    self.quantizer.quantize()
  File "D:\Scripts\bench_chat\Lib\site-packages\awq\quantize\quantizer.py", line 179, in quantize
    scales_list = [
                  ^
  File "D:\Scripts\bench_chat\Lib\site-packages\awq\quantize\quantizer.py", line 180, in <listcomp>
    self._search_best_scale(self.modules[i], **layer)
  File "D:\Scripts\bench_chat\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "D:\Scripts\bench_chat\Lib\site-packages\awq\quantize\quantizer.py", line 340, in _search_best_scale
    fp16_output = self._module_forward(inp, module2inspect, module_kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Scripts\bench_chat\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "D:\Scripts\bench_chat\Lib\site-packages\awq\quantize\quantizer.py", line 269, in _module_forward
    partial_output = module(x_partial, **module_kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Scripts\bench_chat\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Scripts\bench_chat\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: LlamaAttention.forward() missing 1 required positional argument: 'attention_mask'
@DebarshiChanda
Copy link

++
Facing the same issue with Qwen2 also

@casper-hansen
Copy link
Owner

Transformers 4.48.0 broke a lot of packages that rely on transformers. Downgrade to 4.47.1 to fix it

@BBC-Esq
Copy link
Author

BBC-Esq commented Jan 14, 2025

Any idea what the issue is yet? 4.47.1 is working for me so far (not finished quantizing yet) but I'm assuming eventually you plan to modify your source code to comport with their new API?

@BBC-Esq
Copy link
Author

BBC-Esq commented Jan 14, 2025

If it helps, I found a comment in some code at the unsloth repository that seems to pertain to this issue...might be a good place to look for how they addressed the issue:

unslothai/unsloth@December-2024...2025-01#diff-a45b72bb533eda979990bd79cde5fe9c9fde424779a4f1fc1195b75853d93b45L20

I'm specifically referring to the code that reads as follows:

from unsloth_zoo.utils import Version
transformers_version = Version(transformers_version)
# Transformers moved rotary embeddings out of all attention layers
IS_ATTENTION_REFACTOR = transformers_version > Version("4.47.1")

@BBC-Esq
Copy link
Author

BBC-Esq commented Jan 15, 2025

@casper-hansen I think this might be a good starting point and save you some time...

unslothai/unsloth#1491

@hebangwen
Copy link

Hi, this error is caused by transformers.PreTrainedModel.prepare_inputs_for_generation. After this function, the required attention_mask position parameter is removed from layer_kwargs dict. If you need to use transformers v4.58.0, try my git patch as below:

diff --git a/awq/quantize/quantizer.py b/awq/quantize/quantizer.py
index 28280b0..c6f1b54 100644
--- a/awq/quantize/quantizer.py
+++ b/awq/quantize/quantizer.py
@@ -584,9 +584,11 @@ class AwqQuantizer:
         except ValueError:  # work with early exit
             pass
         modules[0] = modules[0].module  # restore
+        has_attention_mask = "attention_mask" in layer_kwargs
 
         # Update the layer kwargs with `prepare_inputs_for_generation` method
         # that takes care of everything to avoid unexpected errors.
+        # NOTE: After this function, `attention_mask` is removed from `layer_kwargs`. BUT `attention_mask` is catched by Catcher.
         layer_kwargs = self.model.prepare_inputs_for_generation(samples, **layer_kwargs)
         # Pop the input_ids as they are not needed at all.
         layer_kwargs.pop("input_ids")
@@ -603,6 +605,8 @@ class AwqQuantizer:
             layer_kwargs["attention_mask"] = layer_kwargs["attention_mask"].to(
                 best_device
             )
+        elif has_attention_mask:
+            layer_kwargs["attention_mask"] =None
 
         return modules, layer_kwargs, inps

PS: This fix is not pretty good, because attention_mask isn't definetely None in some case.

@BBC-Esq
Copy link
Author

BBC-Esq commented Jan 21, 2025

Thanks dude, I tried looking for the core issue but got frustrated after a few hours. Hopefully @casper-hansen gets around to responding and/or updating his codebase.

@ntoxeg
Copy link

ntoxeg commented Feb 13, 2025

I've downgraded Transformers to 4.47.1 but now I get the following error:

Traceback (most recent call last):
  File "/app/AutoAWQ/examples/cli.py", line 47, in <module>
    main()
  File "/app/AutoAWQ/examples/cli.py", line 38, in main
    model.quantize(tokenizer, quant_config=quant_config)
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/awq/models/base.py", line 241, in quantize
    self.quantizer.quantize()
  File "/usr/local/lib/python3.12/dist-packages/awq/quantize/quantizer.py", line 200, in quantize
    self._apply_quant(self.modules[i], named_linears)
  File "/usr/local/lib/python3.12/dist-packages/awq/quantize/quantizer.py", line 240, in _apply_quant
    q_linear = q_linear_module.from_linear(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/awq/modules/linear/gemm.py", line 184, in from_linear
    assert scales is not None and zeros is not None
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

The model is open-thoughts/OpenThinker-32B, which is Qwen-based.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants