keep getting error regarding missing positional argument 'attention_mask' #690

BBC-Esq · 2025-01-14T05:49:34Z

I've tried various troubleshooting steps but can't seem to resolve the following error:

Traceback (most recent call last):
  File "D:\Scripts\bench_chat\convert_awq_gui2.py", line 105, in run
    model.quantize(
  File "D:\Scripts\bench_chat\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "D:\Scripts\bench_chat\Lib\site-packages\awq\models\base.py", line 239, in quantize
    self.quantizer.quantize()
  File "D:\Scripts\bench_chat\Lib\site-packages\awq\quantize\quantizer.py", line 179, in quantize
    scales_list = [
                  ^
  File "D:\Scripts\bench_chat\Lib\site-packages\awq\quantize\quantizer.py", line 180, in <listcomp>
    self._search_best_scale(self.modules[i], **layer)
  File "D:\Scripts\bench_chat\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "D:\Scripts\bench_chat\Lib\site-packages\awq\quantize\quantizer.py", line 340, in _search_best_scale
    fp16_output = self._module_forward(inp, module2inspect, module_kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Scripts\bench_chat\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "D:\Scripts\bench_chat\Lib\site-packages\awq\quantize\quantizer.py", line 269, in _module_forward
    partial_output = module(x_partial, **module_kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Scripts\bench_chat\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Scripts\bench_chat\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: LlamaAttention.forward() missing 1 required positional argument: 'attention_mask'

The text was updated successfully, but these errors were encountered:

DebarshiChanda · 2025-01-14T07:48:42Z

++
Facing the same issue with Qwen2 also

casper-hansen · 2025-01-14T08:07:00Z

Transformers 4.48.0 broke a lot of packages that rely on transformers. Downgrade to 4.47.1 to fix it

BBC-Esq · 2025-01-14T08:37:20Z

Any idea what the issue is yet? 4.47.1 is working for me so far (not finished quantizing yet) but I'm assuming eventually you plan to modify your source code to comport with their new API?

BBC-Esq · 2025-01-14T14:27:42Z

If it helps, I found a comment in some code at the unsloth repository that seems to pertain to this issue...might be a good place to look for how they addressed the issue:

unslothai/unsloth@December-2024...2025-01#diff-a45b72bb533eda979990bd79cde5fe9c9fde424779a4f1fc1195b75853d93b45L20

I'm specifically referring to the code that reads as follows:

from unsloth_zoo.utils import Version
transformers_version = Version(transformers_version)
# Transformers moved rotary embeddings out of all attention layers
IS_ATTENTION_REFACTOR = transformers_version > Version("4.47.1")

BBC-Esq · 2025-01-15T19:34:32Z

@casper-hansen I think this might be a good starting point and save you some time...

unslothai/unsloth#1491

hebangwen · 2025-01-21T12:15:47Z

Hi, this error is caused by transformers.PreTrainedModel.prepare_inputs_for_generation. After this function, the required attention_mask position parameter is removed from layer_kwargs dict. If you need to use transformers v4.58.0, try my git patch as below:

diff --git a/awq/quantize/quantizer.py b/awq/quantize/quantizer.py
index 28280b0..c6f1b54 100644
--- a/awq/quantize/quantizer.py
+++ b/awq/quantize/quantizer.py
@@ -584,9 +584,11 @@ class AwqQuantizer:
         except ValueError:  # work with early exit
             pass
         modules[0] = modules[0].module  # restore
+        has_attention_mask = "attention_mask" in layer_kwargs
 
         # Update the layer kwargs with `prepare_inputs_for_generation` method
         # that takes care of everything to avoid unexpected errors.
+        # NOTE: After this function, `attention_mask` is removed from `layer_kwargs`. BUT `attention_mask` is catched by Catcher.
         layer_kwargs = self.model.prepare_inputs_for_generation(samples, **layer_kwargs)
         # Pop the input_ids as they are not needed at all.
         layer_kwargs.pop("input_ids")
@@ -603,6 +605,8 @@ class AwqQuantizer:
             layer_kwargs["attention_mask"] = layer_kwargs["attention_mask"].to(
                 best_device
             )
+        elif has_attention_mask:
+            layer_kwargs["attention_mask"] =None
 
         return modules, layer_kwargs, inps

PS: This fix is not pretty good, because attention_mask isn't definetely None in some case.

BBC-Esq · 2025-01-21T12:29:25Z

Thanks dude, I tried looking for the core issue but got frustrated after a few hours. Hopefully @casper-hansen gets around to responding and/or updating his codebase.

ntoxeg · 2025-02-13T14:56:33Z

I've downgraded Transformers to 4.47.1 but now I get the following error:

Traceback (most recent call last):
  File "/app/AutoAWQ/examples/cli.py", line 47, in <module>
    main()
  File "/app/AutoAWQ/examples/cli.py", line 38, in main
    model.quantize(tokenizer, quant_config=quant_config)
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/awq/models/base.py", line 241, in quantize
    self.quantizer.quantize()
  File "/usr/local/lib/python3.12/dist-packages/awq/quantize/quantizer.py", line 200, in quantize
    self._apply_quant(self.modules[i], named_linears)
  File "/usr/local/lib/python3.12/dist-packages/awq/quantize/quantizer.py", line 240, in _apply_quant
    q_linear = q_linear_module.from_linear(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/awq/modules/linear/gemm.py", line 184, in from_linear
    assert scales is not None and zeros is not None
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

The model is open-thoughts/OpenThinker-32B, which is Qwen-based.

DebarshiChanda mentioned this issue Jan 21, 2025

Failed to convert Qwen2-VL-7B-Instruct LORA model #692

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

keep getting error regarding missing positional argument 'attention_mask' #690

keep getting error regarding missing positional argument 'attention_mask' #690

BBC-Esq commented Jan 14, 2025

DebarshiChanda commented Jan 14, 2025

casper-hansen commented Jan 14, 2025

BBC-Esq commented Jan 14, 2025 •

edited

Loading

BBC-Esq commented Jan 14, 2025

BBC-Esq commented Jan 15, 2025

hebangwen commented Jan 21, 2025

BBC-Esq commented Jan 21, 2025

ntoxeg commented Feb 13, 2025

keep getting error regarding missing positional argument 'attention_mask' #690

keep getting error regarding missing positional argument 'attention_mask' #690

Comments

BBC-Esq commented Jan 14, 2025

DebarshiChanda commented Jan 14, 2025

casper-hansen commented Jan 14, 2025

BBC-Esq commented Jan 14, 2025 • edited Loading

BBC-Esq commented Jan 14, 2025

BBC-Esq commented Jan 15, 2025

hebangwen commented Jan 21, 2025

BBC-Esq commented Jan 21, 2025

ntoxeg commented Feb 13, 2025

BBC-Esq commented Jan 14, 2025 •

edited

Loading