You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix scaleUpChron check for queue time using max_queue_time_minutes (#6618)
The parameter returned by the hud query `queued_jobs_aggregate`
`min_queue_time_minutes` means for the returned number of queued jobs,
what is the minimum of the jobs queue time. The parameter
`min_queue_time_minutes` in contrast the the one with the maximum queued
job waiting for a particular instance type.
Currently we've been filtering for `min_queue_time_minutes`, what
doesn't make a lot of sense. It does not add any additional
checks/protections and can introduce fatal failures. In case the query
have a divergent configuration from the lambda, say 10 minutes over 30
minutes used currently, and new jobs are always coming, the scaleUpChron
will never run.
So, as this is a fine check (it stills don't do exactly what we wanted
it to do, but, better than nothing): I kept the check but validate if at
least the longest queued job for an instance type is at least
`SCALE_UP_MAX_QUEUE_TIME_MINUTES` (default to 30). This still would make
it possible to overprovision in case of hud fails, but it is better than
nothing.
0 commit comments