Skip to content

[QUESTION] Inquiry about DynamicGenerator: EOS not returned until max_new_tokens reached, despite stop_conditions #809

@keds-rnd

Description

@keds-rnd

OS

Linux

GPU Library

CUDA 12.x

Python version

3.10

Pytorch version

2.8.0+cu128

Model

gemm3 27b exl2

Describe the bug

Hello,

I'm a developer in South Korea using your framework, and I want to start by saying thank you for building such an excellent library like ExLlamaV2.

I am currently using a combination of ExLlamaV2 (ExLlamaV2DynamicJob) with FastAPI and Redis to handle multiple concurrent user requests.

I've observed an issue where, even when the model's response to a user query is shorter than max_new_tokens, the generation process seems to continue internally until it reaches max_new_tokens before finally reporting the End-of-Stream (EOS) status.

This persistence occurs even though I have explicitly set stop_conditions. (Currently, I'm mitigating this by forcibly closing the web connection when no data is received for a set period, to allow the next conversation to begin.)

I'm wondering if there's a recommended way to force the LLM to return the EOS status immediately after the output is logically complete, without having to wait until max_new_tokens has been fully generated.

Here is the code I use to create the job:

job = ExLlamaV2DynamicJob(
    input_ids=input_ids, max_new_tokens=max_new_tokens,
    stop_conditions=get_stop_conditions(PROMPT_FORMAT, tokenizer),
    gen_settings=ExLlamaV2Sampler.Settings(),
    filter_prefer_eos=True, identifier=job_id
)

The model I am using is Gemma 3.

Thank you for your time and assistance!

Best regards.

Reproduction steps

.

Expected behavior

.

Logs

No response

Additional context

No response

Acknowledgements

  • I have looked for similar issues before submitting this one.
  • I understand that the developers have lives and my issue will be answered when possible.
  • I understand the developers of this program are human, and I will ask my questions politely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions