Skip to content

Commit 367dacc

Browse files
authored
Fix scaleUpChron check for queue time using max_queue_time_minutes (#6618)
The parameter returned by the hud query `queued_jobs_aggregate` `min_queue_time_minutes` means for the returned number of queued jobs, what is the minimum of the jobs queue time. The parameter `min_queue_time_minutes` in contrast the the one with the maximum queued job waiting for a particular instance type. Currently we've been filtering for `min_queue_time_minutes`, what doesn't make a lot of sense. It does not add any additional checks/protections and can introduce fatal failures. In case the query have a divergent configuration from the lambda, say 10 minutes over 30 minutes used currently, and new jobs are always coming, the scaleUpChron will never run. So, as this is a fine check (it stills don't do exactly what we wanted it to do, but, better than nothing): I kept the check but validate if at least the longest queued job for an instance type is at least `SCALE_UP_MAX_QUEUE_TIME_MINUTES` (default to 30). This still would make it possible to overprovision in case of hud fails, but it is better than nothing.
1 parent 2ab7816 commit 367dacc

File tree

4 files changed

+6
-6
lines changed

4 files changed

+6
-6
lines changed

terraform-aws-github-runner/modules/runners/lambdas/runners/src/scale-runners/config.ts

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ export class Config {
3939
readonly scaleConfigOrg: string;
4040
readonly scaleConfigRepo: string;
4141
readonly scaleConfigRepoPath: string;
42-
readonly scaleUpMinQueueTimeMinutes: number;
42+
readonly scaleUpMaxQueueTimeMinutes: number;
4343
readonly scaleUpChronRecordQueueUrl: string | undefined;
4444
readonly secretsManagerSecretsId: string | undefined;
4545
readonly sSMParamCleanupAgeDays: number;
@@ -98,8 +98,8 @@ export class Config {
9898
this.retryScaleUpRecordJitterPct = Number(process.env.RETRY_SCALE_UP_RECORD_JITTER_PCT || '0');
9999
this.retryScaleUpRecordQueueUrl = process.env.RETRY_SCALE_UP_CHRON_RECORD_QUEUE_URL;
100100
this.scaleUpChronRecordQueueUrl = process.env.SCALE_UP_CHRON_HUD_QUERY_URL;
101-
this.scaleUpMinQueueTimeMinutes = process.env.SCALE_UP_MIN_QUEUE_TIME_MINUTES
102-
? Number(process.env.SCALE_UP_MIN_QUEUE_TIME_MINUTES)
101+
this.scaleUpMaxQueueTimeMinutes = process.env.SCALE_UP_MAX_QUEUE_TIME_MINUTES
102+
? Number(process.env.SCALE_UP_MAX_QUEUE_TIME_MINUTES)
103103
: 30;
104104
this.runnerGroupName = process.env.RUNNER_GROUP_NAME;
105105
this.runnersExtraLabels = process.env.RUNNER_EXTRA_LABELS;

terraform-aws-github-runner/modules/runners/lambdas/runners/src/scale-runners/scale-up-chron.test.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ const runnerTypeInvalid = 'runner_type_invalid';
7979

8080
const baseCfg = {
8181
scaleConfigOrg: 'test_org1',
82-
scaleUpMinQueueTimeMinutes: 30,
82+
scaleUpMaxQueueTimeMinutes: 30,
8383
scaleUpChronRecordQueueUrl: 'url',
8484
} as unknown as Config;
8585

terraform-aws-github-runner/modules/runners/lambdas/runners/src/scale-runners/scale-up-chron.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ export async function scaleUpChron(metrics: ScaleUpChronMetrics): Promise<void>
3434
runner.max_queue_time_minutes,
3535
);
3636
return (
37-
runner.min_queue_time_minutes >= Config.Instance.scaleUpMinQueueTimeMinutes &&
37+
runner.max_queue_time_minutes >= Config.Instance.scaleUpMaxQueueTimeMinutes &&
3838
runner.org === Config.Instance.scaleConfigOrg
3939
);
4040
})

terraform-aws-github-runner/modules/runners/scale-up-chron.tf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ resource "aws_lambda_function" "scale_up_chron" {
6262
SCALE_CONFIG_REPO_PATH = var.scale_config_repo_path
6363
SECRETSMANAGER_SECRETS_ID = var.secretsmanager_secrets_id
6464
SCALE_UP_CHRON_HUD_QUERY_URL = var.retry_scale_up_chron_hud_query_url
65-
SCALE_UP_MIN_QUEUE_TIME_MINUTES = 30
65+
SCALE_UP_MAX_QUEUE_TIME_MINUTES = 30
6666

6767
AWS_REGIONS_TO_VPC_IDS = join(
6868
",",

0 commit comments

Comments
 (0)