You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(gguf): extend Encoding slot to support percentage-mixed recipes
For files where the byte distribution is genuinely mixed — common in
asymmetric MoE quantization where routed experts use a different ggml
type than attention projections — a single LLAMA_FTYPE_MOSTLY_* label
in the Encoding slot can be misleading (it captures the majority by
bytes but says nothing about the rest of the recipe).
Extends the Encoding slot grammar to allow a hyphen-joined sequence
of `<pct><quant>` tokens listed in descending order of byte share,
e.g. `55IQ2_XXS-34Q2_K-07Q8_0-03F16`. Only components above ~2%
should appear, so the sum need not equal 100. Single-token Encodings
remain unchanged, so all existing filenames are still valid under
the new regex.
Originating use case: `huggingface.co/antirez/deepseek-v4-gguf`
(DeepSeek V4 Flash shipped with routed experts at IQ2_XXS/Q2_K and
attention projections + shared experts at Q8_0).
Changes:
- Encoding section gains a paragraph describing the recipe form, the
ordering convention (descending by byte share), and the threshold
(only components >~2% should appear).
- New example added showing a full filename in spec-compliant form.
- Validator regex (in both the prose and the Node.js block) extended
to accept the multi-component form. Verified backward-compat
against every existing example in the doc.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: docs/gguf.md
+10-2Lines changed: 10 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,6 +40,7 @@ The components are:
40
40
- If model is missing a version number then assume `v1.0` (First Public Release)
41
41
- This can be derived from gguf metadata `general.version`
42
42
1.**Encoding**: Indicates the weights encoding scheme that was applied to the model. Content, type mixture and arrangement however are determined by user code and can vary depending on project needs.
43
+
- For files where the byte distribution is genuinely mixed (e.g. asymmetric MoE quants where routed experts use a different ggml type than attention projections), the Encoding may instead be a hyphen-joined sequence of `<pct><quant>` tokens listed in descending order of byte share, where `<pct>` is a 1–3 digit percentage and `<quant>` is a ggml tensor type name (e.g. `IQ2_XXS`, `Q2_K`, `Q4_K`, `Q8_0`, `F16`, `F32`, `BF16`). Only components representing more than ~2% of bytes should appear, so the sum need not equal 100. Example: `55IQ2_XXS-34Q2_K-07Q8_0-03F16` means the file's bytes are ~55% IQ2_XXS, ~34% Q2_K, ~7% Q8_0, ~3% F16.
43
44
1.**Type**: Indicates the kind of gguf file and the intended purpose for it
44
45
- If missing, then file is by default a typical gguf tensor model file
45
46
-`LoRA` : GGUF file is a LoRA adapter
@@ -55,7 +56,7 @@ The components are:
55
56
56
57
At a minimum all model files should have at least BaseName, SizeLabel, Version, in order to be easily validated as a file that is keeping with the GGUF Naming Convention. An example of this issue is that it is easy for Encoding to be mistaken as a FineTune if Version is omitted.
57
58
58
-
To validate you can use this regular expression `^(?<BaseName>[A-Za-z0-9\s]*(?:(?:-(?:(?:[A-Za-z\s][A-Za-z0-9\s]*)|(?:[0-9\s]*)))*))-(?:(?<SizeLabel>(?:\d+x)?(?:\d+\.)?\d+[A-Za-z](?:-[A-Za-z]+(\d+\.)?\d+[A-Za-z]+)?)(?:-(?<FineTune>[A-Za-z0-9\s-]+))?)?-(?:(?<Version>v\d+(?:\.\d+)*))(?:-(?<Encoding>(?!LoRA|vocab|MTP)[\w_]+))?(?:-(?<Type>LoRA|vocab|MTP))?(?:-(?<Shard>\d{5}-of-\d{5}))?\.gguf$` which will check that you got the minimum BaseName, SizeLabel and Version present in the correct order.
59
+
To validate you can use this regular expression `^(?<BaseName>[A-Za-z0-9\s]*(?:(?:-(?:(?:[A-Za-z\s][A-Za-z0-9\s]*)|(?:[0-9\s]*)))*))-(?:(?<SizeLabel>(?:\d+x)?(?:\d+\.)?\d+[A-Za-z](?:-[A-Za-z]+(\d+\.)?\d+[A-Za-z]+)?)(?:-(?<FineTune>[A-Za-z0-9\s-]+))?)?-(?:(?<Version>v\d+(?:\.\d+)*))(?:-(?<Encoding>(?!LoRA|vocab|MTP)[\w_]+(?:-\d{1,3}[A-Z][\w_]*)*))?(?:-(?<Type>LoRA|vocab|MTP))?(?:-(?<Shard>\d{5}-of-\d{5}))?\.gguf$` which will check that you got the minimum BaseName, SizeLabel and Version present in the correct order.
0 commit comments