-
Notifications
You must be signed in to change notification settings - Fork 10
Standardised API for sharing thread pools #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Would |
@repi If the information was relayed correctly, having crates implement a trait like this would solve your issues with crates that implement their own executor. Do you have any concerns that are not covered by the proposed trait? |
As gamedev wg it would be great to also discuss the needs specially for engines and games. In particular, how to handle priorities of certain tasks and potential pinning to specific threads, which would not be covered by the current |
I have a rough proposal for an API. The idea is to provide an API that lets the user be in control of when each executor crate runs work, to provide some control around time budgets and to let the user be in control of how executors are multiplexed on OS threads. /// This is implemented by tokio/rayon/async-std for their executors.
/// The user builds as many workers as is desired, and places them onto threads as desired.
trait WorkerBuilder {
/// Builds a worker that can notify the user of work available by using the provided Waker
fn build_worker(waker: Waker) -> Box<dyn Worker>;
}
/// Implemented by tokio/rayon/async-std. This is the executor itself, which polls futures or runs queued tasks.
trait Worker {
/// Polls the worker, doing work if available.
/// The time_budget argument indicates the caller's desire for the executor to finish within the duration
/// May return a Duration that indicates the worker's desire to be polled again at the expiration of the duration
fn poll(&mut self, time_budget: Option<Duration>) -> Option<Duration>;
}
... create Workers and spawn worker_threads...
usage:
fn worker_thread(parker: Parker, workers: Vec<Worker>) {
loop {
// in more complex cases, you may want to prioritize work as Wodann said,
// and only poll the Workers that are most important, for example frame job workers.
for worker in workers {
// should keep track of each worker's wakeup timeout desire and wake as appropriate
worker.poll(Some(worker_time_budget));
}
// The Wakers provided to Workers would unpark this Parker
parker.park();
}
} |
After lots of good discussion in Discord and thinking about it a bit, I think this is a much more difficult problem than what raw-window-handle addresses. I do also perceive that there is some risk of ecosystem split, so I'd like to see something done, but I don't think a solution will come easily. It seems like even defining the problem in a way that everyone completely agrees with is difficult. Case in point, I was thinking of the problem differently than @kabergstrom. As I understand it, his proposal inserts extensibility at a different layer of the stack than what I had in mind. I think both approaches could be useful and are fairly orthogonal. The problem as I had it in mind was that many crates send their work directly to a thread pool implementation. So for example: In this example, specs/shred is strongly coupled to rayon. AFAIK there isn't a way to have the work sent to tokio or some other executor. @bitshifter mentioned PhysX has a solution for this: At first I was thinking we could recommend crates offer an API like this, but this could end up being quite a lot of work for people maintaining them. Crates like rayon are really pleasant and easy to use, allowing code like this: I also think there is potentially a lot of diversity in what kinds of tasks a crate can produce. Tasks could be long/short-running, low/high priority, IO/CPU bound. Sometimes an end-user will want the work generated by an upstream crate to be pinned to a particular thread. Sometimes it's important to allow tasks to stack up to create back pressure and slow down the amount of work an upstream crate is producing. Some tasks are fire-and-forget, and other block code that needs to run immediately after the work is done, possibly using a result from the tasks. Different games might even need to handle work coming from the same upstream crates differently. So even if upstream crates had a task delegation layer like PhysX, they'd probably have their own small differences, for good reason. While a utility crate could probably be created to help upstream crates add a task delegation layer, I think it would be difficult to come up with a single interface that expresses every possible usage an upstream crate might need. The communication is actually bidirectional - the crate generating the work has to express what to do, and also be able to listen for a result. As I mentioned before, this is different from @kabergstrom's approach. I don't think one is better than the other, and I could see both approaches being used at the same time. Whatever we do, I think it will need to be prototyped and experimented with, and the process won't be as quick and easy as it was for raw-window-handle. |
@aclysma I would see this as an internal detail that would not change the user level API of any crate. For example, PhysX doesn't require you to implement their CPU dispatcher API, they provide a default implementation and it doesn't change the high level use of the library. I wouldn't expect this kind of interface to change rayon any more than their current |
ProposalThis proposes a first approach regarding pushing context information from the call site over to libraries. Therefore, this proposal focuses on the library interface only based on the following assumption: For the caller of a lib function it is sufficient to provide task relevant data at this level of abstraction (e.g high level library task won't spawn low level library tasks). This allows to split the issue of providing an API into two parts:
Practical PartIMO the issue of defining a task API is similar to passing custom allocators down to libraries. Which leads to point 1 being the same for both issues (task & allocator), while the 2nd is specific to the problem. To tackle the 1st point, the proposal would be to create suballocators and subexecutors (let's call them context) by the caller and pass these to the library Examplelet low_task_executor = main_executor.low_priority();
entities.par_iter(&low_task_executor).for_each(|x| { .. });
let linear_allocator = main_allocator.get_linear_allocator(..);
renderer.set_allocator(linear_allocator); The Pros/ConsPros:
Cons:
|
Hi! I was pointed to this discussion and was wondering how the A little known fact about
If you compile
We'd be very interested in talking about the problem of libraries not abstracting over executors and not being prepared for the presence of multiple executors and want to spend time designing there. |
We already have several proposals that we want to prototype with, but as discussed in the wg meeting it'd be good to know the use cases that the prototype API should test:
If any use cases are missing, please list them. |
There's a new Repo for the prototypes to be collected into: |
Job systems in the wild with focus on the executor part (excludes data dependencies, high level scheduling over multiple frames etc) with a short description:
(Ideally, the API should not hinder intergration of profiling/debugging middleware like RAD Telemetry) |
I found another API example of what a thread pool API might look like in C++ land. Another piece of physics middleware, this time the FEMFX library from AMD - https://gpuopen.com/gaming-product/femfx/ The interface appears to be a bunch of function pointers - https://github.com/GPUOpen-Effects/FEMFX/blob/master/amd_femfx/inc/FEMFXTaskSystemInterface.h You can see an implementation that has compile time support for UE4's task scheduler, Intel TBB and TLTaskSystem which appears to be FEMFX's own implementation of a task system (see https://github.com/GPUOpen-Effects/FEMFX/blob/master/samples/sample_task_system/TLTaskSystem.cpp) I thought this was another good example demonstrating usage in an AAA major game engine in addition to the PhysX interface I mentioned earlier. |
https://async.rs/blog/stop-worrying-about-blocking-the-new-async-std-runtime/ This might do too much stuff automatically for it to be considered acceptable by everyone, but it's interesting as a point of reference at least. |
The following blogpost highlights a crate that might cover most of this use case: |
Executor trait interface: https://github.com/bastion-rs/agnostik |
In the working group meeting #67, @kabergstrom mentioned that several crates that use threads pools, use the OS to handle time slicing (e.g. Rayon, Tokio) and as such are at risk of falling outside of the Rust game ecosystem. More concretely, a solution would: let the user have control over multiplexing executor work onto OS threads.
To resolve this issue, they proposed designing a standardised API for sharing thread pools in the spirit of raw-window-handle.
There is a Reddit discussion in which we are gauging interest.
The text was updated successfully, but these errors were encountered: