sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs #12858

Alcpz · 2025-04-10T00:48:38Z

This PR extends the work introduced in #12035.
MMVQ Q4_0 now supports the block_q_t reorder layout.

The improvements are reflected in Text generation. The improvement of PP512 in the DataMax 1100 is noise.

The PR includes:

A refactor of vecdot traits, defined in the reorder_vec_dot_q_sycl struct.
A new entrypoint for reordered MMVQ vecdots reorder_mul_mat_vec_q4_0_q8_1_sycl
The new helper function safe_div, to be more consistent with the naming of other backends for ceil/roundup division

Still pending TODOs:

Improve and find a proper location for the comment describing the reordered layout
Default to DMMV if the reordered Q4_0 is not supported.
Get the performance for an Arc7X0

Benchmarking

Compiler: ICPX 2025.1

Builds:

02082f1 (b4963)
44e199d (This PR)

GPU & Drivers:

Intel(R) Arc(TM) B580 Graphics 20.1.0 [1.6.32567+16]
Intel(R) Data Center GPU Max 1100 12.60.7 [1.6.32567+18]
Lunar Lake, Intel(R) Arc(TM) Graphics 20.4.4 [1.6.32567+16] (iGPU)
Intel(R) Arc(TM) A770 Graphics 12.55.8 [1.6.32567+18]

DISABLE_OPT is the value of GGML_SYCL_DISABLE_OPT

GPU	model	backend	ngl	DISABLE_OPT	(`02082f1`) pp512	(`44e199d`) pp512	(`02082f1`) tg128	(`44e199d`) tg128
B580	qwen2 1.5B Q4_0	SYCL	99	0	6286.16 ± 14.00	6233.05 ± 22.66	105.35 ± 1.70	134.61 ± 5.38
B580	llama 7B Q4_0	SYCL	99	0	1649.27 ± 1.84	1648.96 ± 2.41	40.97 ± 0.19	65.21 ± 0.21
B580	phi3 3B Q4_0	SYCL	99	0	2461.62 ± 3.06	2462.38 ± 3.46	62.36 ± 0.43	94.31 ± 0.20
B580	qwen2 1.5B Q4_0	SYCL	99	1	7863.81 ± 30.10	7813.15 ± 55.52	100.45 ± 2.72	96.97 ± 0.32
B580	llama 7B Q4_0	SYCL	99	1	2211.87 ± 1.64	2212.20 ± 1.83	40.03 ± 0.22	39.85 ± 0.08
B580	phi3 3B Q4_0	SYCL	99	1	3133.46 ± 5.73	3132.75 ± 4.61	61.17 ± 0.34	61.75 ± 0.45

GPU	model	backend	ngl	DISABLE_OPT	(`02082f1`) pp512	(`44e199d`) pp512	(`02082f1`) tg128	(`44e199d`) tg128
DataMax 1100	qwen2 1.5B Q4_0	SYCL	99	0	6759.80 ± 38.41	7272.88 ± 40.08	121.96 ± 1.07	143.40 ± 0.90
DataMax 1100	llama 7B Q4_0	SYCL	99	0	1778.88 ± 6.92	1793.16 ± 7.07	56.72 ± 0.25	71.40 ± 0.41
DataMax 1100	phi3 3B Q4_0	SYCL	99	0	2863.51 ± 13.92	2867.34 ± 4.07	92.18 ± 0.20	110.15 ± 0.57
DataMax 1100	qwen2 1.5B Q4_0	SYCL	99	1	9169.12 ± 142.33	9350.59 ± 60.30	94.20 ± 0.43	94.29 ± 0.41
DataMax 1100	llama 7B Q4_0	SYCL	99	1	2543.61 ± 8.16	2553.34 ± 22.99	36.27 ± 0.09	36.61 ± 0.07
DataMax 1100	phi3 3B Q4_0	SYCL	99	1	3952.37 ± 24.30	3938.14 ± 23.66	66.91 ± 0.07	67.24 ± 0.15

GPU	model	backend	ngl	DISABLE_OPT	(`02082f1`) pp512	(`44e199d`) pp512	(`02082f1`) tg128	(`44e199d`) tg128
Arc 140V	qwen2 1.5B Q4_0	SYCL	99	0	1100.09 ± 1.13	1127.81 ± 35.95	38.14 ± 0.38	46.03 ± 0.21
Arc 140V	llama 7B Q4_0	SYCL	99	0	316.47 ± 0.41	321.85 ± 5.92	13.09 ± 0.73	20.52 ± 0.04
Arc 140V	phi3 3B Q4_0	SYCL	99	0	512.94 ± 0.32	515.34 ± 1.68	20.49 ± 0.33	30.56 ± 0.10
Arc 140V	qwen2 1.5B Q4_0	SYCL	99	1	1492.78 ± 60.74	1514.96 ± 55.74	34.06 ± 0.23	33.80 ± 1.07
Arc 140V	llama 7B Q4_0	SYCL	99	1	519.44 ± 0.78	393.29 ± 17.66	11.69 ± 0.45	12.04 ± 0.94
Arc 140V	phi3 3B Q4_0	SYCL	99	1	752.71 ± 21.10	787.60 ± 6.11	18.77 ± 0.04	18.85 ± 0.10

Update:

GPU	model	backend	ngl	GGML_SYCL_DISABLE_OPT	(`02082f1`) pp512	(`44e199d`) pp512	(`02082f1`) tg128	(`44e199d`) tg128
Arc 770	qwen2 1.5B Q4_0	SYCL	99	0	4129.32 ± 5.55	4147.61 ± 2.76	41.90 ± 0.39	45.46 ± 0.23
Arc 770	llama 7B Q4_0	SYCL	99	0	1482.81 ± 2.52	1488.96 ± 3.57	26.50 ± 0.34	34.37 ± 0.03
Arc 770	phi3 3B Q4_0	SYCL	99	0	2186.30 ± 1.60	2188.98 ± 2.76	33.52 ± 0.41	39.99 ± 0.18
Arc 770	qwen2 1.5B Q4_0	SYCL	99	1	4505.65 ± 15.35	4511.62 ± 4.28	40.89 ± 0.38	40.98 ± 0.33
Arc 770	llama 7B Q4_0	SYCL	99	1	1733.53 ± 1.17	1733.94 ± 0.99	30.05 ± 0.22	29.97 ± 0.23
Arc 770	phi3 3B Q4_0	SYCL	99	1	2484.85 ± 1.69	2486.82 ± 3.40	36.62 ± 0.01	36.33 ± 0.36

Signed-off-by: Alberto Cabrera <[email protected]>

NeoZhangJianyu · 2025-04-10T08:09:51Z

ggml/src/ggml-sycl/ggml-sycl.cpp

+        // Dispatch becomes obscure with the reorder, MMVQ when the reorder optimization
+        // is enabled takes precedence over DMMV, the current if-else implementation
+        // requires disabling DMMV if both conditions are met
+        || (reorder && ggml_sycl_supports_reorder_mmvq(src0->type))) {


This PR is named for Intel GPUs.
Why change the code for CUDA?
In fact, reorder the src0 won't happen for non-intel GPU.
So this code has no impact.
Suggest remove it.

Your comment aligns with my suspicion that this change is obscure. This line changes the kernels from DMMV to MMVQ if reorder is enabled and it's supported, so it's no longer only for CUDA devices.

I need to rethink how the dispatcher does the work.

Yes, please ignore this comment.

Another comment:
The reorder behavior impact the code path in this PR: use_dequantize_mul_mat_vec = use_dequantize_mul_mat_vec && !use_mul_mat_vec_q;

This code works well for CUDA, instead of Intel GPU.
That's why it's limited for only CUDA backend.
Some cases (models) will get benefit from it, some will become bad for Intel GPU.

I suggest removing this behavior.
Only optimize the OPs by reorder. Not change the code path.

From what we have measured the new mmvq code path with the reorder optimization is more optimized on Intel devices as well (cf the PR description). Can you let us know if you find a model or device where this is causing a performance regression? That's why we suggest to enable it by default now.

Driver and OS only impact the performance in general.
You must care the shape of tensor.
For example, some code work well for 32 * n, but bad for 24 * n.

I test it on Linux.

Thanks for the clarification. It's hard to understand the issues you are finding because I don't fully know what you are testing. I'll try to replicate the results locally and depending on the findings see if the PR has to be split or the dispatch could be slightly improved.

I use following cmd to test.

./examples/sycl/build.sh
./examples/sycl/run-llama2.sh

Yes, thanks for that. I was able to replicate the issue using an Arc 770, I'm investigating the root cause, since it seems specific of the 7XX architecture.

I've been trying to find and address all the issues discussed above. After rebasing on top of #13003 I 've found no explicit difference between our MMVQ implementation, this PR implementation, and CUDA's mmvq implementation: For the same input we get identical outputs.

I've also checked multiple times (with and without a set seed) the output of the example scripts:

sampler seed: 0 sampler params: repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000 dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 4096 top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800 mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000 sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist generate: n_ctx = 4096, n_batch = 2048, n_predict = 400, n_keep = 1 Building a website can be done in 10 simple steps: Step 1: Get domain and hosting Step 2: Choose a theme Step 3: Choose your colors Step 4: Build your homepage Step 5: Build your pages Step 6: Build your blog Step 7: Build your contact page Step 8: Build your about page Step 9: Add social media icons Step 10: Add some copy How much does it cost to build a website? Is it easy to create a website? What are the benefits of building a website? How can you create a website for free? There are many different ways to build a website, and the best way for you depends on your goals, budget, and expertise. However, there are some basic steps you can take to get started. The first step is to choose a domain name and hosting plan. A domain name is your website’s address (e.g., www.example.com), while hosting is where your website files live on the internet. You’ll need to purchase both of these from a third-party provider. Once you’ve got your domain and hosting, you’ll need to choose a website builder. A website builder is a platform that allows you to create and edit your website without having to know how to code. There are many different website builders to choose from, each with their own set of features and pricing plans. Once you’ve chosen a website builder, the next step is to choose a theme. A theme is the look and feel of your website. There are many different themes to choose from, and each one will have its own set of features. You can usually find a demo of a theme to help you decide if it’s the right fit for your website. After you’ve chosen a theme, it’s time to choose your colors. Colors can have a big impact on your website’s look and

So, I think almost all comments are addressed for this PR. I still need to rebase to cleanup the helper code I and @Rbiessy added, but it should be in a good state soon.

As for the final question about correctness due to the code path change, I think it's a question of:

Do we want to switch from DMMV to MMVQ?

CUDA for example seems to have been using the Q8_1 + fma / dp4a approach, which, yeah may sacrifice some precision, but as discussed in #11972 it seems a conscious choice (That discussion addresses a different topic, but the underlying issue is the same).
We can discuss what to do with the expected codepath in the SYCL discussion.

As for this PR, should we go with (yet another) environment variable? I'm going to advocate for MMVQ as the default path since I've only seen the precision issues in Arc770 and the performance is noticeable for text generation. In case we agree to it I'd need to add an entry in the documentation.

I'll continue running tests for other model outputs in the meantime.

Signed-off-by: Alberto Cabrera <[email protected]>

ggml/src/ggml-sycl/quants.hpp

ggml/src/ggml-sycl/mmvq.cpp

ggml/src/ggml-sycl/quants.hpp

ggml/src/ggml-sycl/common.hpp

NeoZhangJianyu · 2025-04-18T06:20:33Z

@Alcpz
I fix the known issues of Q4_0 reorder optimization by #13003.
You could refer to for this PR.
The result wrong issue could be fixed easily.

…pz/mmvq_q4_0_reorder

…eorder

Alcpz

I've updated the PR with a tentative solution to change codepath due to the discussions we are having about model accuracy. There are some changes to review in ggml/src/ggml-sycl/ggml-sycl.cpp to accomodate for the changes in #13003

ggml/src/ggml-sycl/common.hpp

NeoZhangJianyu · 2025-04-30T01:44:11Z

ggml/src/ggml-sycl/ggml-sycl.cpp

+        constexpr bool convert_src1_to_q8_1 = false;
+        if (ggml_sycl_supports_reorder_dmmv(src0->type)) {
+            opt_for_reorder(&ctx, src0, src1, dst);
+        }


I suggest only call opt_for_order() here, instead of call 2 functions.

Suggest like:

opt_for_reorder(&ctx, src0, src1, dst, mm_type = MM_TYPE_DMMV) {
if (mm_type == MM_TYPE_DMMV) {
if (src0->type!=Q4_0) return;
}
if (mm_type == MM_TYPE_MMVQ) {
if (src0->type!=Q4_0) return;
}
....
}

Same comments for following similar code.

Thanks for the suggestion, you're right, it's more readable now. See 48480c8 for the changes.

It's great!

ggml/src/ggml-sycl/ggml-sycl.cpp

Rbiessy

I'm approving now since I will be on holiday for a little while.
This is a good change to improve the performance of the SYCL backend and should be the default IMO. As far as we've seen the accuracy is still good enough with this kernel but the user can choose for higher accuracy with the environment variable (until there is a better way to do this in llama).

AD2605

LGTM !

ggml/src/ggml-sycl/mmvq.cpp

ggml/src/ggml-sycl/vecdotq.hpp

qnixsynapse

LGTM

ggml/src/ggml-sycl/ggml-sycl.cpp

Alcpz

Based on the comments from most people in the PR and the fact that I am not able to reproduce the accuracy issues no matter how much I try (tried Lunar Lake, Arc 770, BMG 580, DataMax 1100), I'm gonna merge this PR and watch closely if any issues are arisen to look for a mechanism to prioritize DMMV only in Q4_0, since Q4_K is also being optimized using MMVQ and no precision issues have been reported there.

@NeoZhangJianyu for visibility.

Take into account that #13109 includes a fix that may be inducing those precision issues you are finding due to a missing wait in the reorder kernel.

* origin/master: (39 commits) server : vision support via libmtmd (ggml-org#12898) sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (ggml-org#12858) metal : optimize MoE for large batches (ggml-org#13388) CUDA: FA support for Deepseek (Ampere or newer) (ggml-org#13306) llama : do not crash if there is no CPU backend (ggml-org#13395) CUDA: fix crash on large batch size for MoE models (ggml-org#13384) imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation (ggml-org#13389) llama-run: add support for downloading models from ModelScope (ggml-org#13370) mtmd : fix batch_view for m-rope (ggml-org#13397) llama : one-off chat template fix for Mistral-Small-2503 (ggml-org#13398) rpc : add rpc_msg_set_tensor_hash_req (ggml-org#13353) vulkan: Allow up to 4096 elements for mul_mat_id row_ids (ggml-org#13326) server : (webui) rename has_multimodal --> modalities (ggml-org#13393) ci : limit write permission to only the release step + fixes (ggml-org#13392) mtmd : Expose helper_decode_image_chunk (ggml-org#13366) server : (webui) fix a very small misalignment (ggml-org#13387) server : (webui) revamp the input area, plus many small UI improvements (ggml-org#13365) convert : support rope_scaling type and rope_type (ggml-org#13349) mtmd : fix the calculation of n_tokens for smolvlm (ggml-org#13381) context : allow cache-less context for embeddings (ggml-org#13108) ...

NeoZhangJianyu · 2025-05-13T07:13:51Z

@Alcpz
I find the PR still don't resolve the wrong result issue on llama7b-Q_0.
The performance is reduced too.

We hope this feature can increase the performance more.
But in this test case, it's reduced.

test by:
/examples/sycl/build.sh
./examples/sycl/run-llama2.sh

As following setting:
GGML_SYCL_DISABLE_OPT: 0
GGML_SYCL_PRIORITIZE_DMMV: 0

Result
Step 1: Get to know the basics of web design
Step 2: Set up a web hosting account
Step 3: Download a free website builder
Step 4: Set up a domain name
Step 5: Design your website
Step 6: Add content to your site
Step 7: Make the site responsive
Step 8: Add a contact form
Step 9: Add a social media share button
Step 10: Advertise your website

llama_perf_context_print: prompt eval time = 322.34 ms / 19 tokens ( 16.97 ms per token, 58.94 tokens per second)
llama_perf_context_print: eval time = 10927.57 ms / 399 runs ( 27.39 ms per token, 36.51 tokens per second)

But after enable the new feature in this PR:
GGML_SYCL_DISABLE_OPT: 0
GGML_SYCL_PRIORITIZE_DMMV: 1

Step 1: Select the Website Name
Step 2: Select the Website Type
Step 3: Choose a Website Theme
Step 4: Choose a Website Name
Step 5: Choose a Website Theme
Step 6: Choose a Website Name
Step 7: Choose a Website Theme
Step 8: Choose a Website Name
Step 9: Choose a Website Theme
Step 10: Choose a Website Theme

llama_perf_context_print: prompt eval time = 294.09 ms / 19 tokens ( 15.48 ms per token, 64.61 tokens per second)
llama_perf_context_print: eval time = 14253.18 ms / 399 runs ( 35.72 ms per token, 27.99 tokens per second)

NeoZhangJianyu · 2025-05-13T07:16:16Z

@Alcpz
I suggest to revert this PR firstly.
After the issue is fixed, merge it again.

qnixsynapse · 2025-05-13T07:16:36Z

@NeoZhangJianyu I recommend testing with llama 3 now. llama 2 is outdated and no one uses it.

NeoZhangJianyu · 2025-05-13T07:21:52Z

I wonder why llama2 result is wrong.
The reorder feature only change the data store position. The result of any LLM shouldn't be impacted.

llama2 is the design base of many other models.
I'm afraid that other LLM based on llama2 will be impacted too.

In this PR, I guess the root cause is the code path is changed.
It impact the result.
But looks it is always ignored.

NeoZhangJianyu · 2025-05-13T07:38:34Z

@NeoZhangJianyu I recommend testing with llama 3 now. llama 2 is outdated and no one uses it.

I test with llama3-8b.
The result wrong is present too.

NeoZhangJianyu · 2025-05-13T07:51:21Z

qwen2-1.5b-instruct-q4_0.gguf

No impact of result.

Alcpz · 2025-05-13T08:46:05Z

I'll put a patch up that disables MMVQ for Q4_0 for now until I find out why this happens.

Edit: My point here is, that I'm not reverting all this work if you find issues with specific models. If there are problems we should detect them and fix them, our mul mat dispatching should be flexible enough to allow us to change these things with little effort.

Rbiessy · 2025-05-13T11:15:45Z

@NeoZhangJianyu we've discussed this issue a bit more offline. So far we have not been able to reproduce the issue you mention so we don't want to get blocked by it.
To make the discussions clearer could you open an issue in llama with the exact setup that you use (specific host and device HW and relevant versions of oneAPI, GPU drivers, etc.)? We are planning to share docker images of what we're using exactly. I think we need to better understand what the differences in our setup is.
In the mean time we are planning to keep these changes as they are and progress with #13109. It would also be useful to see if more users run into similar issues or if it is specific to your setup.

(edited missing "not")

NeoZhangJianyu · 2025-05-13T14:57:50Z

I'll put a patch up that disables MMVQ for Q4_0 for now until I find out why this happens.

Edit: My point here is, that I'm not reverting all this work if you find issues with specific models. If there are problems we should detect them and fix them, our mul mat dispatching should be flexible enough to allow us to change these things with little effort.

LGTM!

NeoZhangJianyu · 2025-05-13T15:10:48Z

@NeoZhangJianyu we've discussed this issue a bit more offline. So far we have not been able to reproduce the issue you mention so we don't want to get blocked by it. To make the discussions clearer could you open an issue in llama with the exact setup that you use (specific host and device HW and relevant versions of oneAPI, GPU drivers, etc.)? We are planning to share docker images of what we're using exactly. I think we need to better understand what the differences in our setup is. In the mean time we are planning to keep these changes as they are and progress with #13109. It would also be useful to see if more users run into similar issues or if it is specific to your setup.

(edited missing "not")

My test case is very simple.
It's easy to be reproduced.

Ubuntu 22.04, Arc 770, oneAPI 2025.0.

I don't think we need an issue to trace this issue when we find it during the PR review.
I add the ENV Variable GGML_SYCL_DISABLE_OPT to enable/disable the reorder feature to test the effect of this feature.

During I develop the reorder feature, I notice the wrong result issue, through the UT case is passed.
UT case allow error <0.5.
But we need smaller error in real inference of LLM.
I has spent more time to handle this issue.

Looks like the inference result check is easy ignored.

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Apr 10, 2025

Alcpz added 2 commits April 10, 2025 01:51

sycl : Implemented reorder Q4_0 mmvq

187451b

Signed-off-by: Alberto Cabrera <[email protected]>

sycl : Fixed mmvq being called when reorder is disabled

9c8d809

Alcpz force-pushed the Alcpz/mmvq_q4_0_reorder branch from 44e199d to 9c8d809 Compare April 10, 2025 00:55

NeoZhangJianyu reviewed Apr 10, 2025

View reviewed changes

sycl : Improved comments in the quants header

52b1622

Signed-off-by: Alberto Cabrera <[email protected]>

Rbiessy reviewed Apr 11, 2025

View reviewed changes

ggml/src/ggml-sycl/quants.hpp Outdated Show resolved Hide resolved

ShanoToni reviewed Apr 11, 2025

View reviewed changes

ggml/src/ggml-sycl/mmvq.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-sycl/quants.hpp Outdated Show resolved Hide resolved

AD2605 reviewed Apr 11, 2025

View reviewed changes

ggml/src/ggml-sycl/common.hpp Outdated Show resolved Hide resolved

Rbiessy and others added 4 commits April 16, 2025 18:31

Use static_assert

e8555ab

safe_div -> ceil_div

b60d637

Clarify qi comment

fc768f3

change the reorder tensor from init to execute OP

c7500c9

Rbiessy and others added 2 commits April 22, 2025 13:26

dbg

1e0c4cf

Merge remote-tracking branch 'neozhang/jianyuzh_fix_reorder' into Alc…

de60819

…pz/mmvq_q4_0_reorder

sgeor255 mentioned this pull request Apr 25, 2025

sycl : Implemented reorder Q4_K mmvq #13109

Merged

Alcpz added 3 commits April 28, 2025 15:08

Undo changes to test-backend-ops

dc19cd5

Refactor changes on top of q4_0 reorder fix

351ef2b

Merge remote-tracking branch 'upstream/master' into Alcpz/mmvq_q4_0_r…

34f7bed

…eorder

Alcpz force-pushed the Alcpz/mmvq_q4_0_reorder branch from 770eaa7 to 4f4996a Compare April 28, 2025 14:57

Missing Reverts

d61dda3

Alcpz force-pushed the Alcpz/mmvq_q4_0_reorder branch from 4f4996a to d61dda3 Compare April 28, 2025 14:57

Alcpz commented Apr 28, 2025

View reviewed changes

ggml/src/ggml-sycl/common.hpp Show resolved Hide resolved

Alcpz requested review from ShanoToni, Rbiessy, AD2605 and NeoZhangJianyu April 28, 2025 15:02

NeoZhangJianyu reviewed Apr 30, 2025

View reviewed changes

Rbiessy approved these changes Apr 30, 2025

View reviewed changes

Refactored opt_for_reorder logic to simplify code path

48480c8

AD2605 approved these changes May 6, 2025

View reviewed changes

ggml/src/ggml-sycl/mmvq.cpp Show resolved Hide resolved

ShanoToni approved these changes May 6, 2025

View reviewed changes

AD2605 reviewed May 6, 2025

View reviewed changes

ggml/src/ggml-sycl/vecdotq.hpp Outdated Show resolved Hide resolved

ggml/src/ggml-sycl/vecdotq.hpp Outdated Show resolved Hide resolved

qnixsynapse approved these changes May 6, 2025

View reviewed changes

ggml/src/ggml-sycl/ggml-sycl.cpp Outdated Show resolved Hide resolved

Alcpz added 3 commits May 9, 2025 12:19

Merge branch 'master' into Alcpz/mmvq_q4_0_reorder

6afb367

Explicit inlining and unroll

6fe27eb

Renamed mul_mat_algo enum for consistency

e809b07

Alcpz commented May 9, 2025

View reviewed changes

Alcpz merged commit 17512a9 into ggml-org:master May 9, 2025
44 checks passed

sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs #12858

sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs #12858

Uh oh!

Conversation

Alcpz commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarking

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NeoZhangJianyu Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NeoZhangJianyu Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Alcpz Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NeoZhangJianyu commented Apr 18, 2025

Uh oh!

Alcpz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Rbiessy left a comment

Choose a reason for hiding this comment

Uh oh!

AD2605 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qnixsynapse left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Alcpz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NeoZhangJianyu commented May 13, 2025

Uh oh!

NeoZhangJianyu commented May 13, 2025

Uh oh!

qnixsynapse commented May 13, 2025

Uh oh!

NeoZhangJianyu commented May 13, 2025

Uh oh!

NeoZhangJianyu commented May 13, 2025

Uh oh!

Alcpz commented Apr 10, 2025 •

edited

Loading

NeoZhangJianyu Apr 22, 2025 •

edited

Loading

NeoZhangJianyu Apr 23, 2025 •

edited

Loading

Alcpz Apr 28, 2025 •

edited

Loading

NeoZhangJianyu commented May 13, 2025 •

edited

Loading

Alcpz commented May 13, 2025 •

edited

Loading

Rbiessy commented May 13, 2025 •

edited

Loading

NeoZhangJianyu commented May 13, 2025 •

edited

Loading