-
Notifications
You must be signed in to change notification settings - Fork 12.1k
Replies: 1 comment · 8 replies
-
I took the liberty of editing the title so it's more likely the correct people see it. I don't know the answer to your question, but the speculative stuff is very recent and under development and speculation + grammar even more so. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Okay, so it seems like a grammar thing then. You'll probably need to include more information like exactly what grammar you're using, the prompt, etc. Try to include information about the simplest possible way to reproduce the problem. In other words, since the problem doesn't seem related to speculation then you'd want to provide information about how to reproduce it with just |
Beta Was this translation helpful? Give feedback.
All reactions
-
I've tried to use json grammar with grammar rules for parsing and json as below. 1. Code from controller
2. Code from service
3. Code for Rules
4. Json
|
Beta Was this translation helpful? Give feedback.
All reactions
-
Here is log for speculative case.
|
Beta Was this translation helpful? Give feedback.
All reactions
-
Why are there a million duplicate rules? I also don't really understand how anyone could look at a grammar like that and expect processing it to be fast. I don't really know how the grammar stuff works internally but as the number of tokens increases it very likely has to process more stuff to make sure it conforms to the grammar. A grammar like that is going to start out slow and just get worse and worse. I might be wrong but to me this looks like it's almost certainly a problem with the grammar that's being used rather than llama.cpp. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Reason for this situation has been removed. It is found that limitation of string length in rule was wrong. Actually it was even not working. |
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Is there argument to prevent slower of generation?
Even if earlier generation is fast, generation of tokens during prediction is getting slower.
(ex, first 10 tokens can have 20 t/s. But last 10 tokens took 1 t/s.)
It is found from grammar only, grammar with draft model.
Beta Was this translation helpful? Give feedback.
All reactions