Skip to content

CA-410782: Add receive_memory_queues for VM_receive_memory operations #6470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

snwoods
Copy link
Contributor

@snwoods snwoods commented May 15, 2025

Migration spawns 2 operations which depend on each other so we need to ensure there is always space for both of them to prevent a deadlock during localhost and two-way migrations. Adding VM_receive_memory to a new queue ensures that there will always be a worker for the receive operation so the paired send will never be blocked.

This will increase the total number of workers by worker-pool-size. Unlike parallel_queues workers, these workers will be doing actual work (VM_receive_memory), which could in theory increase the workload of a host if it is receiving VMs at the same time as other work, so this needs to be considered before merging this PR.

@snwoods snwoods force-pushed the private/stevenwo/CA-410782 branch 2 times, most recently from 8b019f5 to 780578b Compare May 15, 2025 10:43
@snwoods
Copy link
Contributor Author

snwoods commented May 19, 2025

Passing Ring3 BVT+BST with worker-pool-size=25 (217634) and worker-pool-size=1 (217631)

@snwoods snwoods marked this pull request as ready for review May 19, 2025 16:06
@edwintorok
Copy link
Contributor

increase the workload of a host if it is receiving VMs at the same time as other work, so this needs to be considered before merging this PR

Can we use a different size for these workers than the rest? We have 16 vCPUs in Dom0 currently (although we should probably increase that to 32), so doing more than 16 migrations may not help if you are bottlenecked on CPU. We probably also need some CPU to run stunnel, so the ideal number might be lower than that and closer to 8.

I don't think this would increase load that much compared to the previous situation: you could've already run 25 migrations concurrently as long as you didn't switch directions and one host was always sending, and another receiving.
(well OK, before we changed that number from 16, that limit was only 16).

However making this configurable separately could be done in a different PR.

Copy link
Contributor

@edwintorok edwintorok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, might need to wait for the other xenopsd PR to be merged first to avoid a conflict.

@snwoods
Copy link
Contributor Author

snwoods commented May 20, 2025

Can we use a different size for these workers than the rest?
I could add a limit to the size until we add a proper configurable limit in CP-308089? So e.g. where we currently do:

for _i = 1 to size do
      incr Redirector.default ;
      incr Redirector.parallel_queues
done

We could take the lower of max_migrate_queues = 8 and size and iterate over that for migrate_receive_queues?

A queue size of 1 would be enough to remove the deadlock issue, although of course as all migrate_receives are now scheduled on this queue, making it as small as that would be a bottleneck.

@snwoods
Copy link
Contributor Author

snwoods commented May 20, 2025

I don't think this would increase load that much compared to the previous situation

Yes it would only increase the load for localhost and both ways migrations, where now a host could potentially be doing 25 sends and 25 receives where before it would be doing half that.

Migration spawns 2 operations which depend on each other so we need to
ensure there is always space for both of them to prevent a deadlock.
Adding VM_receive_memory to a new queue ensures that there will always
be a worker for the receive operation so the paired send will never be
blocked.

Signed-off-by: Steven Woods <[email protected]>
@snwoods snwoods force-pushed the private/stevenwo/CA-410782 branch from 780578b to 179854e Compare May 20, 2025 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants