Skip to content

Can we re-enqueue a finished job? #731

Answered by turboderp
nvlas asked this question in Q&A
Discussion options

You must be logged in to vote

It would complicate the generator a lot to allow for suspending jobs indefinitely.

The caching solution it does use is very general-purpose, though. If you generate up to some stop condition (or until a job is canceled), then whatever was in the prompt+completion of that job is going to be cached. So the next time you start a job with that same sequence it will skip the prefill entirely, and really there's no performance penalty to speak of compared to "reviving" a completed job.

And this way all the concerns about how to manage the limited resources are hidden behind an abstraction, with still very efficient inference under the hood. Imagine some model that may output a <think> token whe…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@nvlas
Comment options

@turboderp
Comment options

Answer selected by nvlas
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants