CA-410782: Add receive_memory_queues for VM_receive_memory operations #6470

snwoods · 2025-05-15T09:55:35Z

Migration spawns 2 operations which depend on each other so we need to ensure there is always space for both of them to prevent a deadlock during localhost and two-way migrations. Adding VM_receive_memory to a new queue ensures that there will always be a worker for the receive operation so the paired send will never be blocked.

This will increase the total number of workers by worker-pool-size. Unlike parallel_queues workers, these workers will be doing actual work (VM_receive_memory), which could in theory increase the workload of a host if it is receiving VMs at the same time as other work, so this needs to be considered before merging this PR.

snwoods · 2025-05-19T16:06:40Z

Passing Ring3 BVT+BST with worker-pool-size=25 (217634) and worker-pool-size=1 (217631)

edwintorok · 2025-05-19T16:25:11Z

increase the workload of a host if it is receiving VMs at the same time as other work, so this needs to be considered before merging this PR

Can we use a different size for these workers than the rest? We have 16 vCPUs in Dom0 currently (although we should probably increase that to 32), so doing more than 16 migrations may not help if you are bottlenecked on CPU. We probably also need some CPU to run stunnel, so the ideal number might be lower than that and closer to 8.

I don't think this would increase load that much compared to the previous situation: you could've already run 25 migrations concurrently as long as you didn't switch directions and one host was always sending, and another receiving.
(well OK, before we changed that number from 16, that limit was only 16).

However making this configurable separately could be done in a different PR.

edwintorok

Looks good, might need to wait for the other xenopsd PR to be merged first to avoid a conflict.

snwoods · 2025-05-20T09:52:22Z

Can we use a different size for these workers than the rest?
I could add a limit to the size until we add a proper configurable limit in CP-308089? So e.g. where we currently do:

for _i = 1 to size do
      incr Redirector.default ;
      incr Redirector.parallel_queues
done

We could take the lower of max_migrate_queues = 8 and size and iterate over that for migrate_receive_queues?

A queue size of 1 would be enough to remove the deadlock issue, although of course as all migrate_receives are now scheduled on this queue, making it as small as that would be a bottleneck.

snwoods · 2025-05-20T09:54:40Z

I don't think this would increase load that much compared to the previous situation

Yes it would only increase the load for localhost and both ways migrations, where now a host could potentially be doing 25 sends and 25 receives where before it would be doing half that.

Migration spawns 2 operations which depend on each other so we need to ensure there is always space for both of them to prevent a deadlock. Adding VM_receive_memory to a new queue ensures that there will always be a worker for the receive operation so the paired send will never be blocked. Signed-off-by: Steven Woods <[email protected]>

snwoods force-pushed the private/stevenwo/CA-410782 branch 2 times, most recently from 8b019f5 to 780578b Compare May 15, 2025 10:43

snwoods marked this pull request as ready for review May 19, 2025 16:06

edwintorok approved these changes May 19, 2025

View reviewed changes

snwoods force-pushed the private/stevenwo/CA-410782 branch from 780578b to 179854e Compare May 20, 2025 10:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CA-410782: Add receive_memory_queues for VM_receive_memory operations #6470

CA-410782: Add receive_memory_queues for VM_receive_memory operations #6470

snwoods commented May 15, 2025

snwoods commented May 19, 2025

edwintorok commented May 19, 2025

edwintorok left a comment

snwoods commented May 20, 2025 •

edited

Loading

snwoods commented May 20, 2025

CA-410782: Add receive_memory_queues for VM_receive_memory operations #6470

Are you sure you want to change the base?

CA-410782: Add receive_memory_queues for VM_receive_memory operations #6470

Conversation

snwoods commented May 15, 2025

snwoods commented May 19, 2025

edwintorok commented May 19, 2025

edwintorok left a comment

Choose a reason for hiding this comment

snwoods commented May 20, 2025 • edited Loading

snwoods commented May 20, 2025

snwoods commented May 20, 2025 •

edited

Loading