Misc. bug: Deepseek R1 incompatible with grammars / structured output #11336

noahhaon · 2025-01-21T20:14:44Z

Name and Version

Using llama.cpp at b4516 and ollama at 7bb35

Operating systems

No response

Which llama.cpp modules do you know to be affected?

No response

Command line

Problem description & steps to reproduce

Apologies if this is misfiled as a bug. I'm not sure if it is an enhancement, but grammars are a commonly used feature of llama.cpp and the Deepseek R1 model is very popular, so I wanted to raise it and perhaps think about solutions

Reasoning based models are trained to use tokens to add their own chain of thought to the context. These tokens get suppressed by design when using the sampler with JSON grammar causing the model performance to suffer significantly.

As suppressing the output of the tokens within the tokens is currently being discussed in #11325, I wanted to mention that using grammar to effectively suppress these tokens causes the model to perform badly.

Looking at https://github.com/ggerganov/llama.cpp/blob/master/common/sampling.cpp#L235 , the third parameter to common_sampler_accept appears to be a boolean to enable constraining the sampler by grammar. As the end token is defined at least in Deepseek R1 tokens.json, perhaps it would be a generic-enough solution to disable grammar for these models until this token is reached, if it is a declared token in the loaded model? Or, perhaps a user provided parameter like start_grammar_after_token?

        {
            "id": 128799,
            "content": "</think>",
            "single_word": false,
            "lstrip": false,
            "rstrip": false,
            "normalized": true,
            "special": false
        },

Thanks for all your great work! <3

First Bad Commit

No response

Relevant log output

The text was updated successfully, but these errors were encountered:

KanishkNavale · 2025-01-22T13:49:19Z

+1

jukofyork · 2025-01-24T08:13:58Z

You should be able to do this using the grammar itself, eg:

# Modified root rule with think section first
root        ::= "<think>" think-content "</think>" ws json-value

# Think content rules (any characters except </think>)
think-content ::= ( normal-char | safe-lt )*
normal-char    ::= [^<]
safe-lt        ::= "<" ( [^/] | "/" [^t] | "/t" [^h] | "/th" [^i] | "/thi" [^n] | "/thin" [^k] | "/think" [^>] )

# Original JSON rules below (only changed 'value' -> 'json-value' for clarity)
json-value  ::= object | array | string | number | ("true" | "false" | "null") ws

object      ::= "{" ws (
            string ":" ws json-value
    ("," ws string ":" ws json-value)*
  )? "}" ws

array       ::= "[" ws (
            json-value
    ("," ws json-value)*
  )? "]" ws

string      ::= "\"" (
    [^"\\\x7F\x00-\x1F] |
    "\\" (["\\bfnrt] | "u" [0-9a-fA-F]{4}) # escapes
  )* "\"" ws

number      ::= ("-"? ([0-9] | [1-9][0-9]{0,15})) ("." [0-9]+)? ([eE][-+]?[0-9]+)? ws

ws          ::= ( " " | "\n" [ \t]{0,20} )*

Not tested so might need some corrections, but something like this should work I think.

noahhaon · 2025-01-24T08:36:36Z

@jukofyork Thanks! I wondered if this could be solved with grammar. Unfortunately I suspect most use cases for grammar are with the json_schema_to_grammar, so this would need to be modified there in the various clients that do this.

Also, the .. tags and content needs to be suppressed in the string output to the client to support automated usage, but not sure if this is a client problem or a llama.cpp problem. In any case this is being discussed in #11325

I did a quick spike on ollama to support toggling grammar on and off based on a grammar_start_token parameter, which solves both these issues, but I'm not completely happy with the implementation as it requires both patches to ollama and vendor patches to llama.cpp to support the configuration and toggling.

The patches to the vendored llama.cpp are minimal however, and involve adding an extra param to common_sampler_sample to support toggling grammar when calling from ollama:

-llama_token common_sampler_sample(struct common_sampler * gsmpl, struct llama_context * ctx, int idx, bool grammar_first) {
+llama_token common_sampler_sample(struct common_sampler * gsmpl, struct llama_context * ctx, int idx, bool grammar_first, bool skip_grammar) {
     gsmpl->set_logits(ctx, idx);

     auto & grmr  = gsmpl->grmr;
@@ -308,7 +308,7 @@ llama_token common_sampler_sample(struct common_sampler * gsmpl, struct llama_co

     const llama_token id = cur_p.data[cur_p.selected].id;

-    if (grammar_first) {
+    if (grammar_first || skip_grammar) {
         return id;
     }

jukofyork · 2025-01-24T08:53:50Z

@jukofyork Thanks! I wondered if this could be solved with grammar. Unfortunately I suspect most use cases for grammar are with the json_schema_to_grammar, so this would need to be modified there in the various clients that do this.

Also, the .. tags and content needs to be suppressed in the string output to the client to support automated usage, but not sure if this is a client problem or a llama.cpp problem. In any case this is being discussed in #11325

Yeah, the grammar only solves one specific problem sadly. It still might be useful to enforce English only as I've seen R1 start using Chinese a couple of times and then (luckily) reverted back to English (maybe "thinking" in Chinese helps it, but would definitely not want the final answer in Chinese!).

github-actions · 2025-03-11T01:07:41Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

noahhaon added the bug-unconfirmed label Jan 21, 2025

noahhaon changed the title ~~Misc. bug:~~ Misc. bug: Deepseek R1 incompatible with grammars or structured output Jan 21, 2025

noahhaon changed the title ~~Misc. bug: Deepseek R1 incompatible with grammars or structured output~~ Misc. bug: Deepseek R1 incompatible with grammars / structured output Jan 21, 2025

github-actions bot added the stale label Feb 24, 2025

github-actions bot closed this as completed Mar 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: Deepseek R1 incompatible with grammars / structured output #11336

Misc. bug: Deepseek R1 incompatible with grammars / structured output #11336

noahhaon commented Jan 21, 2025

KanishkNavale commented Jan 22, 2025

jukofyork commented Jan 24, 2025

noahhaon commented Jan 24, 2025

jukofyork commented Jan 24, 2025

github-actions bot commented Mar 11, 2025

Misc. bug: Deepseek R1 incompatible with grammars / structured output #11336

Misc. bug: Deepseek R1 incompatible with grammars / structured output #11336

Comments

noahhaon commented Jan 21, 2025

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

KanishkNavale commented Jan 22, 2025

jukofyork commented Jan 24, 2025

noahhaon commented Jan 24, 2025

jukofyork commented Jan 24, 2025

github-actions bot commented Mar 11, 2025