Skip to content

Eval bug: Qwen3 Q4_0 not working with SYCL #13163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
invent00 opened this issue Apr 29, 2025 · 8 comments
Closed

Eval bug: Qwen3 Q4_0 not working with SYCL #13163

invent00 opened this issue Apr 29, 2025 · 8 comments

Comments

@invent00
Copy link

invent00 commented Apr 29, 2025

Name and Version

version: 5215(5f5e39e)
built with MSVC 19343.34808.0

Operating systems

Windows

GGML backends

SYCL

Hardware

Core Ultra 5 125U 32GB mem(ThinkPad X1 Carbon Gen12)
Driver Version: 32.0.101.6739

Models

Qwen3-4B-gguf Q4_0 (https://huggingface.co/unsloth/Qwen3-4B-GGUF/tree/main)

Problem description & steps to reproduce

When attempting inference with the model, the screen briefly goes black and fails to function properly. However, the Q4_K_M model operates normally.

in addition, cuda build (cu11.7,b5215) work properly with Q4_0.

how to reproduce:

  1. llama-cli.exe -ngl 99 -m Qwen3-4B-Q4_0.gguf
  2. input question
  3. Black out occur

in event log, llama-cli.exe shows following application error
Image

First Bad Commit

No response

Relevant log output

1. llama-cli.exe -ngl 99 -m Qwen3-4B-Q4_0.gguf
2. input question
3. Black out occur
@invent00 invent00 changed the title Eval bug: Eval bug: Qwen3 Q4_0 not working with SYCL Apr 29, 2025
@Sketchfellow
Copy link

I've also had issues with Q4_0 quants on SYCL resulting in the screen going black and crashing for my Arc A770M. I experienced this on Gemma 3 12B QAT as well as Llama 2 7B when running performance benchmarks. I believe the SYCL Q4_0 reorder optimizations resulted in this as setting GGML_SYCL_DISABLE_OPT=1 allowed things to run normally again.

@qnixsynapse
Copy link
Collaborator

qnixsynapse commented Apr 29, 2025

I believe the SYCL Q4_0 reorder optimizations resulted in this as setting GGML_SYCL_DISABLE_OPT=1 allowed things to run normally again

cc @Rbiessy @NeoZhangJianyu @Alcpz ^

@invent00
Copy link
Author

Hi @Sketchfellow,
Thank you for your advice. after set GGML_SYCL_DISABLE_OPT=1 , it works properly.

@Alcpz
Copy link
Collaborator

Alcpz commented Apr 30, 2025

I've been able to reproduce the issue, but only on Windows. Linux seems unaffected. As reported, GGML_SYCL_DISABLE_OPT=1 works without problem. There seems to be something wrong with the reorder, but I would need to have a deeper look at it.

@NeoZhangJianyu
Copy link
Collaborator

Let me check!

@sgeor255
Copy link
Contributor

sgeor255 commented May 6, 2025

@invent00 #13109 should fix this issue. Could you check if it works for you? :) Note that you will need to set/export the environment variable GGML_SYCL_DISABLE_OPT=0 to trigger the reorder codepath which was causing the issue.

@invent00
Copy link
Author

invent00 commented May 8, 2025

@sgeor255 Hi, I builded d7e5179
and tried. it works properly with GGML_SYCL_DISABLE_OPT=0 + Qwen3-4B-Q4_0.gguf.

I confirmed GGML_SYCL_DISABLE_OPT=0 is faster than GGML_SYCL_DISABLE_OPT=1.

Once this is merged into main, I will close this issue.

@invent00
Copy link
Author

Hi,

I confirmed works properly on version:5402 (0a338ed)
with GGML_SYCL_DISABLE_OPT=0 also works properly.

Let me close issue. Thank you for your support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants