Skip to content

Implement orchestration management of lab runners #395

Closed
@ProjectsByJackHe

Description

@ProjectsByJackHe

Today, our lab uses hyper-V machine checkpoints in order to revert the VM back to a previously known good state.

This is how we are achieving (semi) stateless runners. We are doing this hyper-V revert on a schedule: once every 3 hours.

At the very least, if a PR causes a crash/bugcheck, we can be confident the system will heal without manual intervention.

Still, a 3 hour wait may be sub-optimal. We would like to do a revert prior to every run. That means we need some sort of orchestration system so that concurrent jobs do not step on each other's toes, and we may need to accept some trade-offs in terms of complexity / accept that not every job will have a fresh state if we have multiple concurrent jobs.

Metadata

Metadata

Labels

Type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions