You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to understand how the --keep flag works and how to pass a prompt for analysis, the number of tokens in which exceeds the maximum context size
The prompt.txt file contains the following prompt, and its total length is 9232 characters, which equals 1583 tokens:
<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
Based ONLY on the TEXT briefly summarize it.
TEXT: {text}
SUMMARY: <|im_end|>
<|im_start|>assistant
I deliberately set the -c parameter to 100, which is smaller than the number of tokens in the prompt, because the description for --keep states:
"Use -1 to retain all tokens from the initial prompt."
Logically, the entire prompt should somehow be retained and processed. However, in practice, I get the following error:
main: prompt is too long.
I have two questions regarding this:
How do I correctly use the --keep flag, and can it even be used outside of conversation mode?
If I set -с 0, the logs show that the model's n_ctx = 32768. Is there any way to process a prompt with more than 32768 tokens?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I would like to understand how the
--keep
flag works and how to pass a prompt for analysis, the number of tokens in which exceeds the maximum context sizeMy setup:
I am testing
llama-b4384-bin-win-avx2-x64
.Text generation model:
[Qwen2-7B-Instruct-GGUF]
Command I run in PowerShell:
The
prompt.txt
file contains the following prompt, and its total length is 9232 characters, which equals 1583 tokens:I deliberately set the
-c
parameter to100
, which is smaller than the number of tokens in the prompt, because the description for--keep
states:"Use
-1
to retain all tokens from the initial prompt."Logically, the entire prompt should somehow be retained and processed. However, in practice, I get the following error:
main: prompt is too long.
I have two questions regarding this:
--keep
flag, and can it even be used outside of conversation mode?-с 0
, the logs show that the model'sn_ctx = 32768
. Is there any way to process a prompt with more than 32768 tokens?Beta Was this translation helpful? Give feedback.
All reactions