-
I noticed that if we enqueue a dynamic job after it was finished, it resumes and starts generating again. Is this intentional?
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
The job won't be in a valid state after finishing, so it would be coincidental that this works at all. Failing after ~200 tokens is unsurprising since it's probably crossing a cache page boundary at that point. If it were to be supported it would have to be implemented the way you suggest anyway, i.e. creating a new job with the result of the old job, copying sample settings etc. across and then enqueuing the new job. So there wouldn't be any benefit to it, performance wise. I could maybe see the utility of it as a convenience feature, but I'm not sure it's worth the effort to implement. |
Beta Was this translation helpful? Give feedback.
It would complicate the generator a lot to allow for suspending jobs indefinitely.
The caching solution it does use is very general-purpose, though. If you generate up to some stop condition (or until a job is canceled), then whatever was in the prompt+completion of that job is going to be cached. So the next time you start a job with that same sequence it will skip the prefill entirely, and really there's no performance penalty to speak of compared to "reviving" a completed job.
And this way all the concerns about how to manage the limited resources are hidden behind an abstraction, with still very efficient inference under the hood. Imagine some model that may output a
<think>
token whe…