docs : add MTP to GGUF Type slot#1488
Conversation
Adds `MTP` as a third value for the `Type` slot (alongside `LoRA` and `vocab`) to cover Multi-Token Prediction / speculative-decoding draft modules shipped beside a base model. Updates the validation regex in both the prose and JS copies, adds a filename example, and extends the Node.js test cases.
| At a minimum all model files should have at least BaseName, SizeLabel, Version, in order to be easily validated as a file that is keeping with the GGUF Naming Convention. An example of this issue is that it is easy for Encoding to be mistaken as a FineTune if Version is omitted. | ||
|
|
||
| To validate you can use this regular expression `^(?<BaseName>[A-Za-z0-9\s]*(?:(?:-(?:(?:[A-Za-z\s][A-Za-z0-9\s]*)|(?:[0-9\s]*)))*))-(?:(?<SizeLabel>(?:\d+x)?(?:\d+\.)?\d+[A-Za-z](?:-[A-Za-z]+(\d+\.)?\d+[A-Za-z]+)?)(?:-(?<FineTune>[A-Za-z0-9\s-]+))?)?-(?:(?<Version>v\d+(?:\.\d+)*))(?:-(?<Encoding>(?!LoRA|vocab)[\w_]+))?(?:-(?<Type>LoRA|vocab))?(?:-(?<Shard>\d{5}-of-\d{5}))?\.gguf$` which will check that you got the minimum BaseName, SizeLabel and Version present in the correct order. | ||
| To validate you can use this regular expression `^(?<BaseName>[A-Za-z0-9\s]*(?:(?:-(?:(?:[A-Za-z\s][A-Za-z0-9\s]*)|(?:[0-9\s]*)))*))-(?:(?<SizeLabel>(?:\d+x)?(?:\d+\.)?\d+[A-Za-z](?:-[A-Za-z]+(\d+\.)?\d+[A-Za-z]+)?)(?:-(?<FineTune>[A-Za-z0-9\s-]+))?)?-(?:(?<Version>v\d+(?:\.\d+)*))(?:-(?<Encoding>(?!LoRA|vocab|MTP)[\w_]+))?(?:-(?<Type>LoRA|vocab|MTP))?(?:-(?<Shard>\d{5}-of-\d{5}))?\.gguf$` which will check that you got the minimum BaseName, SizeLabel and Version present in the correct order. |
There was a problem hiding this comment.
I think is might be incorrect, we went with the mtp- prefix in this PR: https://github.com/am17an/llama.cpp/pull/9/changes#diff-03b361169a690c5ac8e77460aeba18d833d2b78babab92b7b5bb721fc34947c9R612-R620
There was a problem hiding this comment.
also mmproj- as prefix is pretty much a standard now, we should probably also add it to the regex
There was a problem hiding this comment.
Opened #1496 : moves MTP out of Type and introduces a Sidecar prefix slot covering both mtp- and mmproj-. wdyt?
There was a problem hiding this comment.
Unsloth-style (MTP in repo name, clean filename): the entire repo is dedicated to MTP variants, so MTP is implied by the repo name. Each file inside just needs to disambiguate by quant (Qwen3.6-27B-Q4_K_M.gguf, Qwen3.6-27B-Q5_K_M.gguf, etc.). See unsloth/Qwen3.6-27B-MTP-GGUF repo
There was a problem hiding this comment.
yes, indeed if the model already come with MTP support, it's always better to have both main model + LLM in the same GGUF, it does save a bit of VRAM that way.
the case where MTP and main model are separate GGUFs is mostly for eagle3-style models
There was a problem hiding this comment.
do you have an example repo for eagle3-style?
There was a problem hiding this comment.
do you have an example repo for eagle3-style?
so far, every Eagle3 repo I found ships safetensors only (nvidia/gpt-oss-120b-Eagle3-v3, openbmb/MiniCPM4.1-8B-Eagle3, thoughtworks/Qwen3-8B-Eagle3)
* docs : add Sidecar prefix slot (mmproj, mtp); drop MTP from Type Introduces an optional Sidecar prefix slot at the front of the GGUF filename for auxiliary modules loaded alongside a base model: - mmproj: multimodal projector - mtp: Multi-Token Prediction draft module Removes MTP from the Type slot (added in #1488) so there is exactly one canonical position. Updates the regex (prose + JS), parse helper, filename examples, and Node.js test cases accordingly. * docs : clarify sidecar Parameter Count refers to main model * docs : address julien-c review (format-string consistency + mtp caveat)

Adds
MTPas a third value for theTypeslot (alongsideLoRAandvocab) to cover Multi-Token Prediction / speculative-decoding draft modules shipped beside a base model. Updates the regex in both spots and adds an example + test case