How to use the --keep flag and how to handle prompts exceeding the maximum context size? #11239

avbelka · 2025-01-14T14:56:24Z

avbelka
Jan 14, 2025

Hello,

I would like to understand how the --keep flag works and how to pass a prompt for analysis, the number of tokens in which exceeds the maximum context size

My setup:

I am testing llama-b4384-bin-win-avx2-x64.

Text generation model:

[Qwen2-7B-Instruct-GGUF]

Command I run in PowerShell:

& "…\llama-cli.exe" -m "...\qwen2-7b-instruct-q4_k_m.gguf"
--file "…\prompt.txt" -n 512 --keep -1 -c 100 --temp 0.5 -tb 16 --no-display-prompt

The prompt.txt file contains the following prompt, and its total length is 9232 characters, which equals 1583 tokens:

<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
Based ONLY on the TEXT briefly summarize it.
TEXT: {text}
SUMMARY: <|im_end|>
<|im_start|>assistant

I deliberately set the -c parameter to 100, which is smaller than the number of tokens in the prompt, because the description for --keep states:

"Use -1 to retain all tokens from the initial prompt."

Logically, the entire prompt should somehow be retained and processed. However, in practice, I get the following error:

main: prompt is too long.

I have two questions regarding this:

How do I correctly use the --keep flag, and can it even be used outside of conversation mode?
If I set -с 0, the logs show that the model's n_ctx = 32768. Is there any way to process a prompt with more than 32768 tokens?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to use the --keep flag and how to handle prompts exceeding the maximum context size? #11239

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How to use the --keep flag and how to handle prompts exceeding the maximum context size? #11239

Uh oh!

avbelka Jan 14, 2025

Replies: 0 comments

avbelka
Jan 14, 2025