Skip to content

docs : add MTP to GGUF Type slot#1488

Merged
ggerganov merged 1 commit into
ggml-org:masterfrom
mishig25:gguf-naming-mtp
May 13, 2026
Merged

docs : add MTP to GGUF Type slot#1488
ggerganov merged 1 commit into
ggml-org:masterfrom
mishig25:gguf-naming-mtp

Conversation

@mishig25
Copy link
Copy Markdown
Contributor

@mishig25 mishig25 commented May 12, 2026

Adds MTP as a third value for the Type slot (alongside LoRA and vocab) to cover Multi-Token Prediction / speculative-decoding draft modules shipped beside a base model. Updates the regex in both spots and adds an example + test case

Adds `MTP` as a third value for the `Type` slot (alongside `LoRA` and
`vocab`) to cover Multi-Token Prediction / speculative-decoding draft
modules shipped beside a base model. Updates the validation regex in
both the prose and JS copies, adds a filename example, and extends the
Node.js test cases.
Comment thread docs/gguf.md
At a minimum all model files should have at least BaseName, SizeLabel, Version, in order to be easily validated as a file that is keeping with the GGUF Naming Convention. An example of this issue is that it is easy for Encoding to be mistaken as a FineTune if Version is omitted.

To validate you can use this regular expression `^(?<BaseName>[A-Za-z0-9\s]*(?:(?:-(?:(?:[A-Za-z\s][A-Za-z0-9\s]*)|(?:[0-9\s]*)))*))-(?:(?<SizeLabel>(?:\d+x)?(?:\d+\.)?\d+[A-Za-z](?:-[A-Za-z]+(\d+\.)?\d+[A-Za-z]+)?)(?:-(?<FineTune>[A-Za-z0-9\s-]+))?)?-(?:(?<Version>v\d+(?:\.\d+)*))(?:-(?<Encoding>(?!LoRA|vocab)[\w_]+))?(?:-(?<Type>LoRA|vocab))?(?:-(?<Shard>\d{5}-of-\d{5}))?\.gguf$` which will check that you got the minimum BaseName, SizeLabel and Version present in the correct order.
To validate you can use this regular expression `^(?<BaseName>[A-Za-z0-9\s]*(?:(?:-(?:(?:[A-Za-z\s][A-Za-z0-9\s]*)|(?:[0-9\s]*)))*))-(?:(?<SizeLabel>(?:\d+x)?(?:\d+\.)?\d+[A-Za-z](?:-[A-Za-z]+(\d+\.)?\d+[A-Za-z]+)?)(?:-(?<FineTune>[A-Za-z0-9\s-]+))?)?-(?:(?<Version>v\d+(?:\.\d+)*))(?:-(?<Encoding>(?!LoRA|vocab|MTP)[\w_]+))?(?:-(?<Type>LoRA|vocab|MTP))?(?:-(?<Shard>\d{5}-of-\d{5}))?\.gguf$` which will check that you got the minimum BaseName, SizeLabel and Version present in the correct order.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also mmproj- as prefix is pretty much a standard now, we should probably also add it to the regex

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes 👍

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to open a follow-up PR @mishig25

Copy link
Copy Markdown
Contributor Author

@mishig25 mishig25 May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #1496 : moves MTP out of Type and introduces a Sidecar prefix slot covering both mtp- and mmproj-. wdyt?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where as the unsloth ones don't encode MTP in the gguf filename
image

Copy link
Copy Markdown
Contributor Author

@mishig25 mishig25 May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsloth-style (MTP in repo name, clean filename): the entire repo is dedicated to MTP variants, so MTP is implied by the repo name. Each file inside just needs to disambiguate by quant (Qwen3.6-27B-Q4_K_M.gguf, Qwen3.6-27B-Q5_K_M.gguf, etc.). See unsloth/Qwen3.6-27B-MTP-GGUF repo

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, indeed if the model already come with MTP support, it's always better to have both main model + LLM in the same GGUF, it does save a bit of VRAM that way.

the case where MTP and main model are separate GGUFs is mostly for eagle3-style models

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have an example repo for eagle3-style?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have an example repo for eagle3-style?

so far, every Eagle3 repo I found ships safetensors only (nvidia/gpt-oss-120b-Eagle3-v3, openbmb/MiniCPM4.1-8B-Eagle3, thoughtworks/Qwen3-8B-Eagle3)

ggerganov pushed a commit that referenced this pull request May 21, 2026
* docs : add Sidecar prefix slot (mmproj, mtp); drop MTP from Type

Introduces an optional Sidecar prefix slot at the front of the GGUF
filename for auxiliary modules loaded alongside a base model:

  - mmproj: multimodal projector
  - mtp:    Multi-Token Prediction draft module

Removes MTP from the Type slot (added in #1488) so there is exactly
one canonical position. Updates the regex (prose + JS), parse helper,
filename examples, and Node.js test cases accordingly.

* docs : clarify sidecar Parameter Count refers to main model

* docs : address julien-c review (format-string consistency + mtp caveat)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants