Skip to content

Commit 2dc1573

Browse files
mishig25claude
andcommitted
docs(gguf): extend Encoding slot to support percentage-mixed recipes
For files where the byte distribution is genuinely mixed — common in asymmetric MoE quantization where routed experts use a different ggml type than attention projections — a single LLAMA_FTYPE_MOSTLY_* label in the Encoding slot can be misleading (it captures the majority by bytes but says nothing about the rest of the recipe). Extends the Encoding slot grammar to allow a hyphen-joined sequence of `<pct><quant>` tokens listed in descending order of byte share, e.g. `55IQ2_XXS-34Q2_K-07Q8_0-03F16`. Only components above ~2% should appear, so the sum need not equal 100. Single-token Encodings remain unchanged, so all existing filenames are still valid under the new regex. Originating use case: `huggingface.co/antirez/deepseek-v4-gguf` (DeepSeek V4 Flash shipped with routed experts at IQ2_XXS/Q2_K and attention projections + shared experts at Q8_0). Changes: - Encoding section gains a paragraph describing the recipe form, the ordering convention (descending by byte share), and the threshold (only components >~2% should appear). - New example added showing a full filename in spec-compliant form. - Validator regex (in both the prose and the Node.js block) extended to accept the multi-component form. Verified backward-compat against every existing example in the doc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 5725fee commit 2dc1573

1 file changed

Lines changed: 10 additions & 2 deletions

File tree

docs/gguf.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ The components are:
4040
- If model is missing a version number then assume `v1.0` (First Public Release)
4141
- This can be derived from gguf metadata `general.version`
4242
1. **Encoding**: Indicates the weights encoding scheme that was applied to the model. Content, type mixture and arrangement however are determined by user code and can vary depending on project needs.
43+
- For files where the byte distribution is genuinely mixed (e.g. asymmetric MoE quants where routed experts use a different ggml type than attention projections), the Encoding may instead be a hyphen-joined sequence of `<pct><quant>` tokens listed in descending order of byte share, where `<pct>` is a 1–3 digit percentage and `<quant>` is a ggml tensor type name (e.g. `IQ2_XXS`, `Q2_K`, `Q4_K`, `Q8_0`, `F16`, `F32`, `BF16`). Only components representing more than ~2% of bytes should appear, so the sum need not equal 100. Example: `55IQ2_XXS-34Q2_K-07Q8_0-03F16` means the file's bytes are ~55% IQ2_XXS, ~34% Q2_K, ~7% Q8_0, ~3% F16.
4344
1. **Type**: Indicates the kind of gguf file and the intended purpose for it
4445
- If missing, then file is by default a typical gguf tensor model file
4546
- `LoRA` : GGUF file is a LoRA adapter
@@ -55,7 +56,7 @@ The components are:
5556

5657
At a minimum all model files should have at least BaseName, SizeLabel, Version, in order to be easily validated as a file that is keeping with the GGUF Naming Convention. An example of this issue is that it is easy for Encoding to be mistaken as a FineTune if Version is omitted.
5758

58-
To validate you can use this regular expression `^(?<BaseName>[A-Za-z0-9\s]*(?:(?:-(?:(?:[A-Za-z\s][A-Za-z0-9\s]*)|(?:[0-9\s]*)))*))-(?:(?<SizeLabel>(?:\d+x)?(?:\d+\.)?\d+[A-Za-z](?:-[A-Za-z]+(\d+\.)?\d+[A-Za-z]+)?)(?:-(?<FineTune>[A-Za-z0-9\s-]+))?)?-(?:(?<Version>v\d+(?:\.\d+)*))(?:-(?<Encoding>(?!LoRA|vocab|MTP)[\w_]+))?(?:-(?<Type>LoRA|vocab|MTP))?(?:-(?<Shard>\d{5}-of-\d{5}))?\.gguf$` which will check that you got the minimum BaseName, SizeLabel and Version present in the correct order.
59+
To validate you can use this regular expression `^(?<BaseName>[A-Za-z0-9\s]*(?:(?:-(?:(?:[A-Za-z\s][A-Za-z0-9\s]*)|(?:[0-9\s]*)))*))-(?:(?<SizeLabel>(?:\d+x)?(?:\d+\.)?\d+[A-Za-z](?:-[A-Za-z]+(\d+\.)?\d+[A-Za-z]+)?)(?:-(?<FineTune>[A-Za-z0-9\s-]+))?)?-(?:(?<Version>v\d+(?:\.\d+)*))(?:-(?<Encoding>(?!LoRA|vocab|MTP)[\w_]+(?:-\d{1,3}[A-Z][\w_]*)*))?(?:-(?<Type>LoRA|vocab|MTP))?(?:-(?<Shard>\d{5}-of-\d{5}))?\.gguf$` which will check that you got the minimum BaseName, SizeLabel and Version present in the correct order.
5960

6061
For example:
6162

@@ -90,12 +91,19 @@ For example:
9091
- Weight Encoding Scheme: Q4_K_M
9192
- Type: MTP (Multi-Token Prediction draft module)
9293

94+
* `DeepSeek-V4-Flash-256x8.4B-v1.0-55IQ2_XXS-34Q2_K-07Q8_0-03F16.gguf`
95+
- Model Name: DeepSeek V4 Flash
96+
- Expert Count: 256
97+
- Parameter Count: 8.4B (per-expert)
98+
- Version Number: v1.0
99+
- Weight Encoding Scheme: percentage-mixed recipe — ~55% IQ2_XXS, ~34% Q2_K, ~7% Q8_0, ~3% F16 (by byte share)
100+
93101

94102
<details><summary>Example Node.js Regex Function</summary>
95103

96104
```js
97105
#!/usr/bin/env node
98-
const ggufRegex = /^(?<BaseName>[A-Za-z0-9\s]*(?:(?:-(?:(?:[A-Za-z\s][A-Za-z0-9\s]*)|(?:[0-9\s]*)))*))-(?:(?<SizeLabel>(?:\d+x)?(?:\d+\.)?\d+[A-Za-z](?:-[A-Za-z]+(\d+\.)?\d+[A-Za-z]+)?)(?:-(?<FineTune>[A-Za-z0-9\s-]+))?)?-(?:(?<Version>v\d+(?:\.\d+)*))(?:-(?<Encoding>(?!LoRA|vocab|MTP)[\w_]+))?(?:-(?<Type>LoRA|vocab|MTP))?(?:-(?<Shard>\d{5}-of-\d{5}))?\.gguf$/;
106+
const ggufRegex = /^(?<BaseName>[A-Za-z0-9\s]*(?:(?:-(?:(?:[A-Za-z\s][A-Za-z0-9\s]*)|(?:[0-9\s]*)))*))-(?:(?<SizeLabel>(?:\d+x)?(?:\d+\.)?\d+[A-Za-z](?:-[A-Za-z]+(\d+\.)?\d+[A-Za-z]+)?)(?:-(?<FineTune>[A-Za-z0-9\s-]+))?)?-(?:(?<Version>v\d+(?:\.\d+)*))(?:-(?<Encoding>(?!LoRA|vocab|MTP)[\w_]+(?:-\d{1,3}[A-Z][\w_]*)*))?(?:-(?<Type>LoRA|vocab|MTP))?(?:-(?<Shard>\d{5}-of-\d{5}))?\.gguf$/;
99107

100108
function parseGGUFFilename(filename) {
101109
const match = ggufRegex.exec(filename);

0 commit comments

Comments
 (0)