-
Notifications
You must be signed in to change notification settings - Fork 291
CA-409510: Make xenopsd nested Parallel atoms explicit #6469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
6bdd89f
to
18d9bb4
Compare
@robhoes Is this the work you mentioned yesterday? Was this deadlock noticed in testing/production? |
Each Parallel atom takes up a worker thread whilst its children do the actual work, so we have parallel_queues to prevent a deadlock. However, nested Parallel atoms take up an additional worker, meaning they can still cause a deadlock. This commit adds a new Nested_parallel atomic with matching nested_parallel_queues to remove the possibility of this deadlock. This increases the total number of workers, but these workers are just to hold the Nested_parallel Atomics and will not be doing any actual work Signed-off-by: Steven Woods <[email protected]>
18d9bb4
to
cedf836
Compare
Yes, this has been in the product since I introduced the nested parallel > serial > parallel. We've detecting during testing in heavy workloads |
Can we create by accident a nesting that is not safe? Do we need predicates that check we don't have such a situation? |
Even with this change only 1 level of nesting is supported. Parallel/Serial/NestedParallel/Serial/NestedParallel would probably still deadlock (I don't think we have such a construct in the code currently). We can check the constraints at the top level where we have the full view of the atom tree. Although that'd be a runtime check, not a compile time check. Then toplevel you can have ' I'd suggest to try to write down the polymorphic types for just 1 or 2 atoms (+ serial + parallel + nested_parallel), and once we're happy with those, then we could try do an experiment, and replace it in the code and see what compile errors we get. If we add a constraint that the toplevel atoms must be See here how a state machine can be encoded into polymorphic variants for the basic idea: https://github.com/hammerlab/ketrew/blob/master/src/pure/target.ml#L169 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is so far not addressing the problem of illegal nestings (which would represent an implementation defect).
Is this tied to an XSI? if so, we really should get this merged. If this is not strictly required to resolve the XSI I am happy to see more iterations. |
This is a stopgap until we add compile-time constraints on the nesting, by for example using a polymorphic variant. Signed-off-by: Steven Woods <[email protected]>
93c5b97
to
6258405
Compare
Correct, I've added a check that will log a warning if there are illegal nestings and created ticket CP-308087 to add some proper constraints e.g. using Polymorphic variants. |
All testing green for Ring3 BVT+BST and SRIscsiPerVMScalability (the sequence which found this defect) |
Each Parallel atom takes up a worker thread whilst its children do the actual work, so we have parallel_queues to prevent a deadlock. However, nested Parallel atoms take up an additional worker, meaning they can still cause a deadlock. This commit adds a new Nested_parallel atomic with matching nested_parallel_queues to remove the possibility of this deadlock.
This increases the total number of workers, but these workers are just to hold the Nested_parallel Atomics and will not be doing any actual work