Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asyncio-driven Rust futures #6

Open
ThibaultLemaire opened this issue Jan 24, 2021 · 40 comments
Open

Asyncio-driven Rust futures #6

ThibaultLemaire opened this issue Jan 24, 2021 · 40 comments

Comments

@ThibaultLemaire
Copy link

Hi, so, following PyO3/pyo3#701 I've been taking a look at what you did here, and, first of all, I wanted to congratulate you on the effort you put into it and what you achieved here. Using channels for cross-runtime communication is a pretty good and simple idea.

On my end, I have been focusing entirely on executing async Rust code from (async) Python, which didn't give me the same perspective on the problem and greatly simplified my assumptions (but maybe not my implementation, we'll see).

So I ended up with a different solution: Wrapping the Rust Future into a Python Future-like object.

(Yes, Python - or more accurately, asyncio - confusingly also has a concept of Future, which is an awaitable but not a coroutine and is actually the low-level handle to thread results, for example. But you already know that since you use loop.create_future to communicate the result back to Python.)

What this lets me do is run async Rust code (albeit trivial for now) without a runtime. Or to be more accurate, to use asyncio as the runtime to drive Rust futures.

I believe this approach to be more flexible as a library writer could in turn spin up their runtime of choice (tokio, async-std, or even both...), spawn tasks into it, and directly return the JoinHandle to be awaited by Python. In the end I guess it would be equivalent to what you do here, except without global state.

I'm not sure I'm being very clear, so before I ask you if this approach would make sense for what you're doing or for PyO3, I think I'll just fork Async-PyO3-Examples and explain the reasoning step by step.

As a foretaste and to illustrate, I'm thinking of being able to do something like this (untested):

import asyncio
import a_rust_written_library

async def main():
  rt = a_rust_written_library.Runtime()
  result = await a_rust_written_library.io_operation(rt, "parameter")
  # do something with result

asyncio.run(main())
use {
    pyo3::prelude::*,
    pyo3_futures::RustFuture, // Let's say it sits in a separate crate
};

#[pyclass]
struct Runtime {
    // Wrap e.g. the tokio runtime
}

#[pymethods]
impl Runtime {
    #[new]
    fn new() -> Self {
        // <snip>
    }
}

#[pyfunction]
fn io_operation(rt: &Runtime, param: String) -> RustFuture {
    RustFuture::from(async {
        println!("This actually runs on Python's main thread!");
        let task_a = rt.spawn(async {
            println!("Whereas this is running on the tokio runtime thread.");
            // Do something that requires the tokio runtime
        });
        let task_b = rt.spawn(async {
            // Do something else that requires the tokio runtime
        });
        task_a.await? + task_b.await?
    })
}

For reference you can take a look at my playgroud where I've been prototyping, but I apologise in advance for the mess that it is, and I wouldn't mind if you preferred to wait until I'm done writing some cleaner examples.

@awestlake87 @davidhewitt @ChillFish8 tagging you guys for the discussion

@davidhewitt
Copy link

I'll start by saying that @awestlake87 and @ChillFish8 have thought a lot more about this than me, so please don't count my opinion much; they know much more about the design space.

This approach of being able to spawn a runtime-free future does sound nice. If we were to eventually integrate something into pyo3 core (which I would imagine we would need to in order to have async keyword for #[pyfunction] work out of the box), it'd be nice if it didn't create lots of complications with how users need to set up their runtimes.

Slightly greedy; I wonder if we can support multiple APIs in pyo3-asyncio while we're learning? My hope is that as Rust+Python+async matures, eventually we know what the right bits are to put into the main PyO3 lib, and what can continue to live as optional extras in supporting crates.

(I would probably only want to put the bare minimum upstream in PyO3 to make async #[pyfunction] work.)

@ChillFish8
Copy link

The only issue i see with implementing futures directly like this, is the fact that natively we lack the ability to sanely interact with the generator api and therefore the async api, we can implement rust futures as a sort of macro maybe that produces a python iterator class however this does lack the ability to suspend the event loop polling and resuming (We're left with constantly polling thousands of times a second) which can add alot of unwanted overhead, the overhead I think is the biggest limitation and not being able to control the suspending and resuming due to the lack of ability to create Python's generator objects built ontop of yeild (Originally Pyre made use of the generator replication using the pyproto implementations however this caused alot of CPU usage and latency)

@awestlake87
Copy link
Owner

I'm kinda lacking the background knowledge on the generator API to comment on the technical details for this approach, so I'll kinda lean on @ChillFish8 for some of those comments for now if that's alright.

In my projects I almost always need my futures to communicate with the runtime to perform IO / sleeps / etc, so I think I would pretty much always spawn it into the tokio or async-std runtime anyway. We could always make convenience functions that could handle that indirection, but I think we'd end up with a situation that's pretty similar to what we have already in this library.

Personally, I'd be a bit worried about having the runtime-less future by default since I think that could introduce some confusion for users. I'm imagining that more often than not a new user would initialize pyo3-asyncio and initialize a runtime like tokio, then get a panic from tokio saying the runtime has not been initialized yet if they forget to spawn it onto the tokio runtime.

I guess my question would be does this have some performance / ergonomics wins? I think what @ChillFish8 is talking about in his comment refers to running all Rust futures on the Python event loop rather than just a few select ones. I'd imagine most of the time, these futures would just be asleep and waiting on the JoinHandle to wake them up.

Some Additional Thoughts

Before I talked with @ChillFish8 on the thread that spawned this repo, I'd been thinking about what it would take to share a runtime with Python. My conclusion at that point was that it might be best to run Python coroutines on a Rust runtime instead. I figured this would be a lot of work, but it does look like someone already tried it awhile back with pyo3-tokio. Since it's under the PyO3 org @davidhewitt might have some insight on why they stopped working on it back in 2018. They mysteriously say "I don't think this project makes sense" and that's the end of it.

It's not present on the master branch yet, but if you're worried about flexibility for the runtime instantiation, you might check out the other-runtimes branch. This branch still relies on global state, but it does allow you to completely customize the runtime via the Runtime trait in generic.rs.

@davidhewitt we can open a new issue here or on PyO3 to talk about adding support for async #[pyfunction]. I have some thoughts on it that shouldn't need anything extra in pyo3-asyncio.

@ChillFish8
Copy link

I think the best solution is the two runtimes system of asyncio in the main thread and a rust runtime in another atleast in order to make a ergonomic system.

Would be intersting to see why the pyo3-tokio system was dropped but equally I doubt it would still be useable to referance from now after the considerable changes to the Tokio API, though it would certainly be useful having a shared runtime that allowed hooks from both Python and Rust.

@ThibaultLemaire
Copy link
Author

Hey, thank you all for the rapid feedback!

have async keyword for #[pyfunction] work out of the box

That's exactly what I had in mind when I started working on this.

implement rust futures as [...] a python iterator class [lacks] the ability to suspend the event loop polling and resuming (We're left with constantly polling thousands of times a second)

You are very right to be concerned about this as this was the first issue of my early prototypes. Essentially asncio was left polling on a busy loop until the rust future had completed. However I have solved this by mimicking the behaviour of asyncio futures: Turns out, as a future, you can ask the event loop to put your parent task back to the waiting queue forever by yielding yourself. You can then ask to be rescheduled by calling the callbacks that were passed to you then. But I'll be submitting a PR to your examples page explaining all that with code examples.

I would pretty much always spawn [futures] into the tokio or async-std runtime anyway.

I got to the same conclusion when trying to run some real world code like reqwest, but that's an issue with plain Rust anyway. You can try to run a reqwest future with futures::block_on without any compiler warning or error and end-up with a runtime panic because the future expected some tokio globals. I'm not saying it's good, I'm saying we shouldn't try to solve a problem which out-scales PyO3.

Which gives me a nice segue into

you [can] completely customize the runtime via the Runtime trait in generic.rs

And this is nice, but my idea is to let the runtime be the flexible thing that it currently is in async Rust. How would you solve the problem of using two different async libraries that require two different runtimes in pure Rust? Well I can see a few ways: simply have two runtimes on two threads, spawn tasks, and have them exchange join handles. Use channels, like you guys did. Or try some compat glue like futures-compat... So, wouldn't it be nice if we could just treat asyncio as yet an other runtime? (Albeit a very naugthy one.)

But again, I think the discussion will be more fruitful once I'm done writing my examples, which I'll try and do this week, as my weekend will be taken by the Global Game Jam.

Anyway...

you might check out the other-runtimes branch

I have! In fact I skipped directly to it when looking at your code (well, technically, I jumped to the py-future branch which is based off of it). And reading your code gave me a new idea to solve the deadlock issue that I'm having right now with my approach! (it works OK for a few futures, but once you scale it up, Python starts polling futures which are waiting for the GIL to call the callbacks, and ... deadlock.)

@ChillFish8
Copy link

I have! In fact I skipped directly to it when looking at your code (well, technically, I jumped to the py-future branch which is based off of it). And reading your code gave me a new idea to solve the deadlock issue that I'm having right now with my approach! (it works OK for a few futures, but once you scale it up, Python starts polling futures which are waiting for the GIL to call the callbacks, and ... deadlock.)

Is this using a Rust runtime?

@ThibaultLemaire
Copy link
Author

Yes, but also when spawning my own threads

@ChillFish8
Copy link

There is (and i have yet to find out why) but a very weird issue where things like any mult-threaded runtime e.g. Tokio's multi threaded runtime where things can get inter-switched between threads; it will deadlock the Python interpreter or PyO3's ability to either aquire the GIL or actually interact with the GIL (I cant remember which of the two it was in testing) when in the main Python thread, I have yet to be able to re-create this behavour when spawning it in a child thread however and only for when you're handling threads with or without the GIL so why this happens im not entirely sure but ive seen a couple errors from my interpreter linking it to the interpreter deadlocking before it even initialised (weirdness). So generally i recommend any sort of Rust runtime handling which is multi threaded you want to spawn in a child thread.

@awestlake87
Copy link
Owner

Just as an aside since we were talking about supporting async for #[pyfunction], I opened a PR on PyO3 to explain how I was thinking we could go about supporting it. It's very bare-bones and incomplete, but I wanted to get my ideas on it out there so we can have a conversation about it. Let me know what you guys think!

@ThibaultLemaire
Copy link
Author

IMHO dumb standalone futures remain better for async #[pyfunction] support in PyO3.

I respect the maturity of your project and don't pretend to be on that level yet, but I'm positive that my approach works. Right now, I can drive just about any future to completion, even with multiple await points and I can mix and match runtimes by passing around joinhandles directly. The last difficulty I'm facing is deadlocks that I'm working around by bailing out and hoping the GIL gets released before the next iteration of the loop, but I'm pretty sure I have a fix for that too.

So, I acutally believe the two approaches complement each other and could work very well together, but if you're on a tight schedule of course I won't argue any further.

Btw, did you get a chance to take a look at my prototype code?

I should also mention that I have written an example implementation of an asyncio future along with explanations here that should help in understanding how my Rust code is designed.

@awestlake87
Copy link
Owner

awestlake87 commented Jan 29, 2021

I think we've got a good first release candidate with the master branch and possibly the attributes branch, but that doesn't mean we can't support something like this in a future release. I doubt async #[pyfunction] is going in 0.13, I just opened a PR to demonstrate my approach.

I have looked at the example code, and I understand that it works and has potential to be an avenue for a unified Python / Rust runtime. I don't want to give you them impression that I'm arguing from the perspective of maturity, since that's something we can fix with some time and effort. Also, it seems like you've addressed the concerns @ChillFish8 had around polling and resuming futures (although I don't know enough to comment on that specifically). But I think my main concern is whether or not these standalone futures as they are now would be useful to people in practice.

The asyncio-driven futures introduce a new flavor of future that is separate from the futures that should run on tokio or async-std. Essentially it's like using two Rust runtimes at once, which is not necessarily a problem, I just question whether or not users would expect it to be the default behaviour.

In my experience, you only want a single Rust runtime project-wide. That's not to say you couldn't have two, and I'm sure some people do this if they need some interoperability with a project that was built for a different runtime, but I think it's generally the exception, not the rule. That being said, I think this project would in fact support initializing two different Rust runtimes at once, so I don't think this is a problem with the current approach. Since we communicate between Rust and Python with channels, we shouldn't have any issues running futures from two different runtimes concurrently.

Right now, I can drive just about any future to completion, even with multiple await points.

That's true, but usually Futures need to perform some I/O, interact with timers, etc, and unfortunately these tend to be very runtime-specific features. That's why async-std and tokio don't share types like TcpStream or functions like sleep. In order for these asyncio futures to perform real-world async tasks, I would argue we would need to provide implementations for things like TcpStream or sleep that run on the asyncio runtime. Otherwise, you're kinda stuck spawning every future into tokio or async-std anyway like with reqwest.

Providing an implementation for features like TcpStream and sleep is doable! However, there are additional hurdles after these implementations exist which I would argue are considerable. Currently the Rust language has not provided generic abstractions for async executors (and may never provide them). tokio and async-std have pretty widespread adoption, and because of this, most async libraries like async-tungstenite or hyper end up supporting a piece of runtime-specific code to provide the glue they need to run their futures on top of async-std or tokio. For this reason, I think a new PyO3 asyncio runtime would have an ecosystem problem. This asyncio-specific glue code would be needed for a lot of popular libraries.

IMHO dumb standalone futures remain better for async #[pyfunction] support in PyO3.

I'm not sure I agree with this. For me it comes back to the argument about what people would expect to be the default behaviour. I think most people would expect to just go straight into their tokio or async-std code since most of the time there's only a single global runtime. Having async #[pyfunction] use a completely separate runtime by default seems to me like it would cause a lot of confusion for users that aren't super familiar with the library or the finer details of Rust async/await.

I don't want you to get the impression that I'm unwilling to move on these positions, but I think these are still valid concerns. Feel free to chime in if you don't think I'm being fair! As one of the authors of the current implementation I might be pretty biased.

In order to change my mind, I think I would need to see some concrete advantages of this approach:

  • Are there any real-world examples where people do not need a specific runtime for their future?
  • Are there any performance wins with this approach? Would they be on the hot-path or cold-path for most use-cases?
  • Does this provide better interop between Rust Futures and Python Libraries?
    • Sharing the asyncio executor could have some advantages that I haven't considered yet.

@ChillFish8
Copy link

I mean in general python's eventloop will be fairly expensive to rust regardless as you have no way of properly yielding via rust, however you can likely have a simple set of std io tools and just use the low level asyncio handles, like i did with Pyre that uses asyncio to move the state of the server forward using a std Tcp listener. Anything like asyncio.sleep however is likely going to be conpletely out of the question in a sane way, or atleast not without simply wrapping the pyobject.

@awestlake87
Copy link
Owner

awestlake87 commented Jan 29, 2021

I saw the comments about performance in the warnings in the https://github.com/ChillFish8/Async-PyO3-Examples, but I wasn't sure if they still applied. AFAIK we haven't done a comparison of the performance of pyo3-asyncio vs @ThibaultLemaire's prototype, so I was gonna reserve judgement on that until we get some concrete numbers. Performance isn't necessarily a concern if there are other advantages to having an asyncio runtime, but if it did happen to have some wins over the current impl, then that would be a point in its favor.

I mean in general python's eventloop will be fairly expensive to rust regardless as you have no way of properly yielding via rust

I think @ThibaultLemaire mentioned he had a solution to this, unless I'm misunderstanding:

You are very right to be concerned about this as this was the first issue of my early prototypes. Essentially asncio was left polling on a busy loop until the rust future had completed. However I have solved this by mimicking the behaviour of asyncio futures: Turns out, as a future, you can ask the event loop to put your parent task back to the waiting queue forever by yielding yourself.

Is that referring to the same problem @ChillFish8?

@ChillFish8
Copy link

You know i have no idea, I am starting to get completely lost down the rabbit hole of this conversation 😅 Sorry for the delayed reply btw i have been a tad busy lately.

@ThibaultLemaire
Copy link
Author

Just some progress report: I believe I have finally found a satisfying* solution to avoid any deadlock with my code. (Link to the commit)

I have fully documented my approach here. (Although the double callback pattern that I'm using to work around deadlocks isn't needed for those examples and the explanations I wrote are somewhat misleading. So I'll probably rephrase or remove them entirely.)

So I think I'm ready to move on to either writing a proper crate or forking PyO3.

*i.e. that is not a dirty workaround

@davidhewitt
Copy link

davidhewitt commented Feb 22, 2021

Awesome!

Before deciding on any approach to merge into PyO3 core I'd like to fully understand API consequences and performance of each approach. It may be easier to have that discussion if your implementation starts as a separate crate (or additional functionality added to pyo3-asyncio).

@ThibaultLemaire
Copy link
Author

ThibaultLemaire commented Mar 8, 2021

I have started pyo3-futures and am open to suggestions (especially on naming(s)).

Although, I've been thinking: I liked my approach because it felt cleaner, more direct, but I've been unable to put my finger on why exactly, and I still don't have many practical advantages to it.

I mean, let's take my test case of mongodb. With the async-std version of the driver I don't even have to setup or care about the runtime, I just write async Rust, wrap it in my Future-like object, and it just works. But that's just pure luck of how async-std is implemented*: it automatically starts worker threads to send new tasks to (6 on my machine apparently).

So... is this all even worth the effort?

*or the mongodb driver, I actually don't know which is responsible for this behaviour.

@awestlake87
Copy link
Owner

awestlake87 commented Mar 9, 2021

It's possible that the mongodb driver is working in sterling because of how the async-std runtime is written, but since the calls you're making in sterling likely think they're on an async-std worker thread, it might break once more features are added. Does sterling still work with the tokio-runtime feature enabled? I think it might panic when it reaches tokio code since tokio adds some thread-local runtime context to its worker threads. I think the only way around it is to spawn the future on the tokio event loop.

I would love to drop the runtime-specific stuff in this crate, but unfortunately I don't think that's possible with the current async/await design. Rust has one of the best and most flexible async/await designs out there, but writing runtime-agnostic libraries is still a problem. Here's an interesting article about a crate with a potential solution.

Although, I've been thinking: I liked my approach because it felt cleaner, more direct, but I've been unable to put my finger on why exactly, and I still don't have many practical advantages to it.

I think your approach makes better use of the pyo3 utilities surrounding #[pymethods] and #[pyfunction]. You have some structures and traits that might make it more convenient to write Python bindings to async code. I'm more familiar with async/await than I am with PyO3, so I was content to just leave it with some bare functions for conversions for now. However, I think it might be useful to bring some of those ergonomics to this library. #4 and #5 have some conversations about these topics if you're interested in weighing in on that.

@ThibaultLemaire
Copy link
Author

Thank you for the interest, I shall keep working on my end then.

And thank you also for the interesting read, I am now convinced by the concept of "nurseries" (although not so much by Trio's implementation: There is still that ugly runtime sandwich that makes it a pain for us to port foreign async libraries to Python).
But seriously though, if async_executors gets popular enough, we might even be able to write an "asyncio runtime" to drive everything from a single Python thread. I might just suggest async_executors to mongodb.

An other idea I had for flexibility of runtime -- while I'm at it -- is "composable runtimes" : What if, for example, in addition and as an alternative to tokio::runtime::Runtime::block_on, you had an async function that could "drive" the runtime? When awaited by an other runtime, that future would in fact execute the internal futures of the runtime, and once it cannot progress any further, instead of putting the thread to sleep, it would surrender control to the parent runtime by returning Poll::Pending. It would continue in that fashion until all the tasks of that "child" runtime have finished, at which point it could return Poll::Ready.
(I am realising now that this concept is actually very close Nathaniel J. Smith's "nurseries".)

Does sterling still work with the tokio-runtime feature enabled?

No, you're right, it doesn't. In fact, as mongodb is configured by default to run on tokio, that's the first thing I tried and it didn't work. I almost gave up entirely at that point before giving it another go with async-std just out of curiosity. I didn't believe it when it worked out of the box. But now that you mention it, I might try to make it work with tokio again.

You have some structures and traits that might make it more convenient to write Python bindings to async code.

Speaking about the trait, I added it just to see if it looked more ergonomic and to challenge the orphan rule, but once more, I was the lesser wizard, and rustc won. I wanted to implement IntoPyCallbackOutput<*mut pyo3::ffi::PyObject> for T: Future<Output = IntoPyCallbackOutput<PyObject> + Send + 'static but that won't be possible without a PR to pyo3, so, not before some time. (I was hoping to be able to write code like the following, which, while not being full support for async fn is still one step closer.)

#[pyfunction]
fn my_awaitable() -> impl Future<u8> + Send + 'static {
  async { 42 }
}

As for the ergonomics of my trait, I'm not so convinced, and I'm not really using it as a trait, it's just an extension method for now. So I'm not sold on keeping it yet.

@awestlake87
Copy link
Owner

And thank you also for the interesting read, I am now convinced by the concept of "nurseries" (although not so much by Trio's implementation: There is still that ugly runtime sandwich that makes it a pain for us to port foreign async libraries to Python).
But seriously though, if async_executors gets popular enough, we might even be able to write an "asyncio runtime" to drive everything from a single Python thread. I might just suggest async_executors to mongodb.

I think the concept of nurseries is almost indistinguishable from Rust's runtimes. I see nursery.start_soon(task) the same way I see tokio::spawn and async_std::task::spawn since they can dynamically create new concurrent tasks. I think the only differences are that nurseries are meant to be passed as arguments to functions rather than available as global state and that rust's runtimes aren't required to join all their tasks when the runtime is dropped.

I want something like async-executors to make its way into the Rust std so async libraries can have a core set of abstractions to build off of. Generic abstractions from futures-rs are already making their way in (I think the Stream trait could land any day now) and I think once GATs (Generic Associated Types) are more established we may get some sort of core abstractions for runtimes. If we get these, then tokio, async-std, and any other runtimes may become more interchangeable.

An other idea I had for flexibility of runtime -- while I'm at it -- is "composable runtimes" : What if, for example, in addition and as an alternative to tokio::runtime::Runtime::block_on, you had an async function that could "drive" the runtime? When awaited by an other runtime, that future would in fact execute the internal futures of the runtime, and once it cannot progress any further, instead of putting the thread to sleep, it would surrender control to the parent runtime by returning Poll::Pending. It would continue in that fashion until all the tasks of that "child" runtime have finished, at which point it could return Poll::Ready.
(I am realising now that this concept is actually very close Nathaniel J. Smith's "nurseries".)

I believe this is already possible using spawn and JoinHandle. JoinHandle is that future that you wait on in the other runtime:

// untested, but should work even though the main fn is running on tokio and sleep task is running on async_std
#[tokio::main]
async main() {
    let async_std_task = async_std::task::spawn(async move {
        async_std::task::sleep(std::time::Duration::from_secs(1)).await;
    });
    async_std_task.await;
} 

The JoinHandle won't block tokio at all, it'll just return pending until the task is done.

@ThibaultLemaire
Copy link
Author

The JoinHandle won't block tokio at all, it'll just return pending until the task is done.

Yes I'm aware of that, but you need the async_std runtime to run somewhere, right? In this case it's in another thread which is okay, but what if you wanted to have the two runtimes share the same thread? With the current tools, they would simply deadlock, but if you could run a runtime itself asynchronously, then you could achieve single-threaded multi-runtime. Which would be especially interesting in our Python context.

@awestlake87
Copy link
Owner

awestlake87 commented Mar 11, 2021

I don't think single-threaded multi-runtime is possible since runtimes assume complete control over their threads (the task scheduler is a synchronous loop). An async runtime would have to run on some other runtime since async functions can't run by themselves, so I think that would kinda defeat the purpose.

You could potentially have a unified Python/Rust runtime either with your current approach or with Python bindings to run async Python on top of tokio or async-std, but that would still just be a single runtime that's capable of running both async Python and async Rust.

@ThibaultLemaire
Copy link
Author

ThibaultLemaire commented Mar 12, 2021

Well, all I'm doing is speculating at this point, but I'm not convinced that would be impossible, nor useless. We'll see in the far future if I get to implementing some proof of concept.


Also, random thought on structured concurrency that I'm jotting down here because I don't know where else:

What makes the Rust async model so different is its use of a Waker to be passed to the future as a sort of callback to signal when it's ready to be polled again, essentially making it possible to separate the executor from the reactor, correct?

But if you have ever worked with the Future trait directly you have noticed that it doesn't exactly take a Waker, but a Context. This Context currently only contains a Waker but it's pretty obvious it was put there as an open end to add additional context that might be needed in the future (no pun intended).

What if this Context also contained a Spawner?

EDIT: Someone else thought of extending the Context for Structured Concurrency, but for cancellation*, not spawning (At least I think it's what they did with that unsafe black magic that I'm not understanding too much).

EDIT 2: *We're talking spawned child task cancellation here.

@awestlake87
Copy link
Owner

What makes the Rust async model so different is its use of a Waker to be passed to the future as a sort of callback to signal when it's ready to be polled again, essentially making it possible to separate the executor from the reactor, correct?

I believe many async models support this kind of thing. Wakers are a pretty fundamental building block for async. Without them you're just left constantly polling the Future to find out when it's ready. Python Future could probably be considered a waker too.

There are a few things that make Rust's async model unique

  • Boxing futures is not required - you can poll a future in-place without having to box it, making it a zero-cost abstraction
  • Borrows can safely cross await boundaries due to Pin and lifetimes - many languages force the user to clone or Arc<Mutex<_>> their data to ensure memory safety.
  • Bare metal performance - the above features make Rust's async model more efficient than any other language I know of. C/C++ can approach that performance, but not safely and ergonomically.

The Future trait is not inherently tied to a runtime due to the Waker, but that's not necessarily unique. Python coroutines, for instance, can run on any event loop that implements AbstractEventLoop.

separate the executor from the reactor

I could be wrong on this, but I believe reactor, scheduler, executor, runtime, event loop, etc. are all referring to the same thing.

What if this Context also contained a Spawner?

I think the thing that prevents this from happening is that objects like Waker would have to be object-safe (Waker is composed of RawWaker, which is constructed with a vtable for dynamic dispatch). Since spawn is expected to support a generic return type, it's just not possible to have an object-safe Spawn trait. Context or Future would have to be generic across a Spawn type to support that, but since Future also has to be object-safe in order to be boxed, this is also not possible.

EDIT: Someone else thought of extending the Context for Structured Concurrency, but for cancellation, not spawning (At least I think it's what they did with that unsafe black magic that I'm not understanding too much).

In the task-scope API docs the author talks about the motivation behind task-scope. I was confused because Future already supports cancellation via Drop, but what task-scope does is cancel spawned tasks when the task scope is dropped, essentially making sure that spawned tasks are also cleaned up when a future is cancelled. I'm not sure they accomplish this by extending Context or Waker though. Where did you see that?

@ThibaultLemaire
Copy link
Author

I believe many async models support this kind of thing.
I could be wrong on this, but I believe reactor, scheduler, executor, runtime, event loop, etc. are all referring to the same thing.

Yes, thank you for rectifying, I was completely confused. Callback based wake up is definitely not specific to Rust, and "runtimes" have many names. (The fact that asyncio provides a mechanism for waking up tasks is the sole reason why my approach even works in the first place)

Since spawn is expected to support a generic return type, it's just not possible to have an object-safe Spawn trait

Ah you're right, although I reckon that might be circumvented with type erasure, much like how a Waker is not generic actually.

I was confused because Future already supports cancellation via Drop, but what task-scope does is cancel spawned tasks when the task scope is dropped

Correct, I have edited (again) my previous comment.

I'm not sure they accomplish this by extending Context or Waker though. Where did you see that?

I was just hopping through the links of the blog post you previously linked and happened upon tokio-rs/tokio#1879 (comment).

@awestlake87
Copy link
Owner

awestlake87 commented Mar 13, 2021

Ah you're right, although I reckon that might be circumvented with type erasure, much like how a Waker is not generic actually.

I was thinking about it a bit and I think I was wrong. You might be able to do something like this:

trait Spawn {
    fn spawn(&mut self, task: Box<dyn Future<Output = ()>>);
}

impl Context {
    pub fn spawn<F, T>(fut: F) -> JoinHandle<T> where F: Future<Output = T> + 'static {
        let (join_tx, join_hdl) = JoinHandle::channel();
        self.spawner.spawn(async move {
            let output = fut.await;
            join_tx.send(output);
        });

        join_hdl
    }
}

Since std::task::Context is a concrete type, it can have generic methods. Spawn objects can be object-safe provided they work with boxed futures. The boxed future returns (), but sends its arbitrarily typed result through a channel that the JoinHandle listens on.

It seems doable, but idk what the performance/runtime implications are for it since I've never spelunked that deep into the code of an executor like async-std or tokio. spawn usually boxes futures anyway, so that shouldn't be a problem by itself.

@ThibaultLemaire
Copy link
Author

ThibaultLemaire commented Mar 23, 2021

Some progress report:

  • I have found a nice middle ground in terms of ergonomics between plainly returning the wrapper type and full async fn support with a PyAsync<T> struct.
  • I've had to overhaul my approach to support asyncio.gather (and probably other asyncio helper functions that I have yet to test). I am now emulating not just the simple asyncio.Future, but the full asyncio.Task.
    In the end it makes more sense as well when I draw a parallel between the async models of the two languages like this:
    • Python: Task (root) -> Coroutine (branch) -> Future (leaf) with a Task being a Future as well so you can chain and compose them.
    • Rust: < runtime specific > (root) -> async block (branch) -> manually impled Future (leaf). The missing root slot is where my wrapper should fit.
  • My stirling toy project can now read and write to a MongoDB server but only with the most basic operations (find_one and insert_one). For more useful operations, I'll have to support Streams, which, as far as I can anticipate, won't map to AsyncGenerators exactly and will probably require some specific logic.

That last one is a pretty interesting challenge as I'd like to benchmark stirling against motor the official async MongoDB driver for Python. Have you had to deal with streams yet?

PS: I also stole your use of once_cell to cache access to the asyncio module and methods

@awestlake87
Copy link
Owner

So I have dealt with streams a little bit, but I haven't thought much about moving those utilities into pyo3-asyncio. Basically, this is the way I handled them:

Rust to Python - use asyncio.Queue to send messages.
Python to Rust - create #[pyclass] structs that wrap around async_channel::Sender<T> or async_channel::Receiver<T>

Unbounded channels are nicer for the sending side since they don't require conversions in/out of coroutines, but if you need a bounded channel (it's usually a better idea), pyo3-asyncio can translate.

The main reason I haven't thought too much about adding these utilities yet is because there wasn't really a demand for it at the time and there are a lot of different choices for the underlying channel. I'm not sure if the ones I've worked with are ideal.

PS: I also stole your use of once_cell to cache access to the asyncio module and methods

I stole that from ChillFish8 I think, lol.

@ChillFish8
Copy link

Yet another one of my spurges of testing Pyre paying off lol

@ThibaultLemaire
Copy link
Author

I am now emulating not just the simple asyncio.Future, but the full asyncio.Task

On second thought a Task is probably not what you want most of the time:

import asyncio
import myrustmodule

async def foo():
  await myrustmodule.async_fn() # Even if the Rust Future goes straight to Ready without blocking, this is equivalent to
  # await asyncio.create_task(myrustmodule.async_fn())
  # So it has to go through 2 extra iterations of the loop before getting the result
  # (First you register the Rust task to be called on the next iteration of the loop,
  # then, when it's done it registers its callbacks to be called on the next iteration,
  # waking up the parent Task which can finally get the result)

But without it asyncio.gather thinks my futures are already started and hangs forever waiting for a result that will never come.

So, in fact, I should rather be implementing a Coroutine (or technically speaking, a sort of hybrid Coroutine/Future that behaves like a Coroutine most of the time but returns itself to behave like a Future when the underlying Rust Future returns Poll::Pending). But I've been struggling to write a type that is recognised as a Coroutine.

@davidhewitt would you happen to have any pointers on how I could trick asyncio.iscoroutine() to return True for my #[pyclass] ? In Python the recommended way to do that is to inherit the abstract collections.abc.Coroutine which forces you to implement __await__, send, and throw, but I couldn't find how to inherit arbitrary Python types with PyO3. Or I could monkeypatch either asyncio.coroutines.iscoroutine, asyncio.coroutines._iscoroutine_typecache, or asyncio.coroutines._COROUTINE_TYPES.

I know I'm already pretty deep down Python's internals with my custom Future, but monkeypatching is maybe just one step too far for my comfort.

@ChillFish8
Copy link

ChillFish8 commented Mar 25, 2021

You can trick iscoroutine by directly edditing the type flags, see https://github.com/python/cpython/blob/63298930fb531ba2bb4f23bc3b915dbf1e17e9e1/Lib/asyncio/coroutines.py#L172 for a list of coroutine types, you can look at https://github.com/python/cpython/blob/7aaeb2a3d682ecba125c33511e4b4796021d2f82/Lib/types.py#L249 for how the types are defined to make them 'coroutines' but seeing that they havent yet implemented this in C i dont think this is possible without doing this directly in python, either way it's certainly a hack.

@ChillFish8
Copy link

TL;DR the instance's code flags need to contain the 0x180 flag which is 0x180 == CO_COROUTINE | CO_ITERABLE_COROUTINE

all ensure_future is doing is essentially extracting the await object and modifiying it's flags:

co_flags = func.__code__.co_flags

        # Check if 'func' is a coroutine function.
        # (0x180 == CO_COROUTINE | CO_ITERABLE_COROUTINE)
        if co_flags & 0x180:
            return func

        # Check if 'func' is a generator function.
        # (0x20 == CO_GENERATOR)
        if co_flags & 0x20:
            # TODO: Implement this in C.
            co = func.__code__
            # 0x100 == CO_ITERABLE_COROUTINE
            func.__code__ = co.replace(co_flags=co.co_flags | 0x100)
            return func```

@ThibaultLemaire
Copy link
Author

The issue is ensure_future does 3 different checks in this order:

  1. iscoroutine which it will then wrap in a Task (This is exactly what I want since the task will be scheduled to be executed by the loop)
  2. isfuture in which case it leaves it as is. That, I definitely do not want since my Rustine (it's what I've decided to call it) will simply never be started and hang forever.
  3. Then, and only then, will it try to evaluate isawaitable.

So, even if I could find a way to trick isawaitable with co_flags it just won't get to that point since isfuture would evaluate to True because of the hybrid nature of my Rustine. (It has the _asyncio_future_blocking attribute so it's announcing itself to be "duck-type compatible" with an asyncio.Future)

So I basically have two choices:

  1. Make my Rustine a subtype of one of the coroutine types so that isinstance(obj, _COROUTINE_TYPES) evaluates to True.
  2. Split it into two types, one that isawaitable and one that isfuture.

I'd rather go with 1. since 2. means coordinating at least two owners for the impl Future, so right now I'm abusing iscoroutine's cache. Needless to say, I'm not too happy with that.

Is there really no way to inherit a Python type? Or even just fake enough of it that my type gets picked up as a subtype by isinstance

@davidhewitt
Copy link

Is there really no way to inherit a Python type?

It absolutely should be possible, but the problem is that there's weaknesses in the current PyO3 core which mean it can't (yet) be done with PyO3.

I started playing around with an experimental repository at davidhewitt/pytypes, but think that I need to upstream fixes into PyO3 before it's likely to be useful.

If you watch that repository then you'll get a notification when I put out the first release, or otherwise if you want to help push this forward feel free to start some discussion on that repo. At the moment I'm preparing documentation and a few other refinements with a view to releasing PyO3 0.14 soon, so I probably won't be trying to solve it myself for a month or so.

@ThibaultLemaire
Copy link
Author

ThibaultLemaire commented Apr 4, 2021

Cool! I'll stick to my monkey patching for the time being then, and switch to inheritance once pyo3-pytypes is ready; Or give a hand if pyo3-futures reaches a satisfactory point first.

@ThibaultLemaire
Copy link
Author

For more useful operations, I'll have to support Streams, which, as far as I can anticipate, won't map to AsyncGenerators exactly and will probably require some specific logic.

Actually, I've just implemented the find method which returns an AsyncIterator over the documents of a collection based on a filter.

And it was surprisingly easy.

I was afraid StopAsyncIteration had to be raised from the __anext__ method (which would have meant looking ahead into the stream to determine if it's over or not before returning a future that could potentially be None), but thankfully StopAsyncIteration can also be raised from the returned future, so the mapping is really straightforward.

I've got support for Stream -> AsyncIterator!

@ThibaultLemaire
Copy link
Author

ThibaultLemaire commented May 12, 2021

With the recent discussion on performance, I just wanted to jot down a note here:

We probably want to stay off the Python thread as much as possible

This is coming from a short exchange I had a while ago with @ajdavis (the original author of motor) where he shared with me that he measured a 35% increase in performance when he "switched Motor from an async core to a multi-threaded core with an async wrapper".

While I have yet to finish reading all the articles he kindly shared with me on the topic, all the clues I have so far point to that same conclusion.

So, because pyo3-asyncio is fundamentally multithreaded while pyo3-futures purposefully tried to stay single-threaded, the winner in a benchmark would most likely be pyo3-asyncio. (At least with a multi-core machine)

The only question remaining is if we could swap using channels for my Rustines and if they would run any faster.

@ChillFish8
Copy link

generally with my experience building Pyre with asyncio, is keeping to the main thread as much as possible. I ran an experimental setup that than the server in a seperate thread that than called back on the main thread and even though the code exectution / server speed was about 25% faster or so the overhead added by calling from other threads cut performance overall down to about 20% of what it was / currently is

@ChillFish8
Copy link

There is a really fine line getting performance between the two languages in a async context and generally it involves calling Pythonas little as possible. The less you call Python and touch the asyncio event loop or coroutines at all the faster the setup.

@ThibaultLemaire
Copy link
Author

Sorry folks, just a heads up that I will not be working on this any further for the foreseeable future.

If anyone wants to pick up where I left though I'll be happy to help (all my code is Apache 2.0, but it may be lacking proper documentation, so I'm always available for questions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants