Closed
Description
Today, our lab uses hyper-V machine checkpoints in order to revert the VM back to a previously known good state.
This is how we are achieving (semi) stateless runners. We are doing this hyper-V revert on a schedule: once every 3 hours.
At the very least, if a PR causes a crash/bugcheck, we can be confident the system will heal without manual intervention.
Still, a 3 hour wait may be sub-optimal. We would like to do a revert prior to every run. That means we need some sort of orchestration system so that concurrent jobs do not step on each other's toes, and we may need to accept some trade-offs in terms of complexity / accept that not every job will have a fresh state if we have multiple concurrent jobs.
Metadata
Metadata
Assignees
Type
Projects
Status