-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asyncio-driven Rust futures #6
Comments
I'll start by saying that @awestlake87 and @ChillFish8 have thought a lot more about this than me, so please don't count my opinion much; they know much more about the design space. This approach of being able to spawn a runtime-free future does sound nice. If we were to eventually integrate something into pyo3 core (which I would imagine we would need to in order to have Slightly greedy; I wonder if we can support multiple APIs in (I would probably only want to put the bare minimum upstream in |
The only issue i see with implementing futures directly like this, is the fact that natively we lack the ability to sanely interact with the generator api and therefore the async api, we can implement rust futures as a sort of macro maybe that produces a python iterator class however this does lack the ability to suspend the event loop polling and resuming (We're left with constantly polling thousands of times a second) which can add alot of unwanted overhead, the overhead I think is the biggest limitation and not being able to control the suspending and resuming due to the lack of ability to create Python's generator objects built ontop of yeild (Originally Pyre made use of the generator replication using the pyproto implementations however this caused alot of CPU usage and latency) |
I'm kinda lacking the background knowledge on the generator API to comment on the technical details for this approach, so I'll kinda lean on @ChillFish8 for some of those comments for now if that's alright. In my projects I almost always need my futures to communicate with the runtime to perform IO / sleeps / etc, so I think I would pretty much always spawn it into the tokio or async-std runtime anyway. We could always make convenience functions that could handle that indirection, but I think we'd end up with a situation that's pretty similar to what we have already in this library. Personally, I'd be a bit worried about having the runtime-less future by default since I think that could introduce some confusion for users. I'm imagining that more often than not a new user would initialize pyo3-asyncio and initialize a runtime like tokio, then get a panic from tokio saying the runtime has not been initialized yet if they forget to spawn it onto the tokio runtime. I guess my question would be does this have some performance / ergonomics wins? I think what @ChillFish8 is talking about in his comment refers to running all Rust futures on the Python event loop rather than just a few select ones. I'd imagine most of the time, these futures would just be asleep and waiting on the Some Additional ThoughtsBefore I talked with @ChillFish8 on the thread that spawned this repo, I'd been thinking about what it would take to share a runtime with Python. My conclusion at that point was that it might be best to run Python coroutines on a Rust runtime instead. I figured this would be a lot of work, but it does look like someone already tried it awhile back with pyo3-tokio. Since it's under the PyO3 org @davidhewitt might have some insight on why they stopped working on it back in 2018. They mysteriously say "I don't think this project makes sense" and that's the end of it. It's not present on the master branch yet, but if you're worried about flexibility for the runtime instantiation, you might check out the @davidhewitt we can open a new issue here or on PyO3 to talk about adding support for |
I think the best solution is the two runtimes system of asyncio in the main thread and a rust runtime in another atleast in order to make a ergonomic system. Would be intersting to see why the |
Hey, thank you all for the rapid feedback!
That's exactly what I had in mind when I started working on this.
You are very right to be concerned about this as this was the first issue of my early prototypes. Essentially asncio was left polling on a busy loop until the rust future had completed. However I have solved this by mimicking the behaviour of asyncio futures: Turns out, as a future, you can ask the event loop to put your parent task back to the waiting queue forever by yielding yourself. You can then ask to be rescheduled by calling the callbacks that were passed to you then. But I'll be submitting a PR to your examples page explaining all that with code examples.
I got to the same conclusion when trying to run some real world code like Which gives me a nice segue into
And this is nice, but my idea is to let the runtime be the flexible thing that it currently is in async Rust. How would you solve the problem of using two different async libraries that require two different runtimes in pure Rust? Well I can see a few ways: simply have two runtimes on two threads, spawn tasks, and have them exchange join handles. Use channels, like you guys did. Or try some compat glue like But again, I think the discussion will be more fruitful once I'm done writing my examples, which I'll try and do this week, as my weekend will be taken by the Global Game Jam. Anyway...
I have! In fact I skipped directly to it when looking at your code (well, technically, I jumped to the |
Is this using a Rust runtime? |
Yes, but also when spawning my own threads |
There is (and i have yet to find out why) but a very weird issue where things like any mult-threaded runtime e.g. Tokio's multi threaded runtime where things can get inter-switched between threads; it will deadlock the Python interpreter or PyO3's ability to either aquire the GIL or actually interact with the GIL (I cant remember which of the two it was in testing) when in the main Python thread, I have yet to be able to re-create this behavour when spawning it in a child thread however and only for when you're handling threads with or without the GIL so why this happens im not entirely sure but ive seen a couple errors from my interpreter linking it to the interpreter deadlocking before it even initialised (weirdness). So generally i recommend any sort of Rust runtime handling which is multi threaded you want to spawn in a child thread. |
Just as an aside since we were talking about supporting |
IMHO dumb standalone futures remain better for async I respect the maturity of your project and don't pretend to be on that level yet, but I'm positive that my approach works. Right now, I can drive just about any future to completion, even with multiple So, I acutally believe the two approaches complement each other and could work very well together, but if you're on a tight schedule of course I won't argue any further. Btw, did you get a chance to take a look at my prototype code? I should also mention that I have written an example implementation of an asyncio future along with explanations here that should help in understanding how my Rust code is designed. |
I think we've got a good first release candidate with the I have looked at the example code, and I understand that it works and has potential to be an avenue for a unified Python / Rust runtime. I don't want to give you them impression that I'm arguing from the perspective of maturity, since that's something we can fix with some time and effort. Also, it seems like you've addressed the concerns @ChillFish8 had around polling and resuming futures (although I don't know enough to comment on that specifically). But I think my main concern is whether or not these standalone futures as they are now would be useful to people in practice. The In my experience, you only want a single Rust runtime project-wide. That's not to say you couldn't have two, and I'm sure some people do this if they need some interoperability with a project that was built for a different runtime, but I think it's generally the exception, not the rule. That being said, I think this project would in fact support initializing two different Rust runtimes at once, so I don't think this is a problem with the current approach. Since we communicate between Rust and Python with channels, we shouldn't have any issues running futures from two different runtimes concurrently.
That's true, but usually Futures need to perform some I/O, interact with timers, etc, and unfortunately these tend to be very runtime-specific features. That's why Providing an implementation for features like
I'm not sure I agree with this. For me it comes back to the argument about what people would expect to be the default behaviour. I think most people would expect to just go straight into their I don't want you to get the impression that I'm unwilling to move on these positions, but I think these are still valid concerns. Feel free to chime in if you don't think I'm being fair! As one of the authors of the current implementation I might be pretty biased. In order to change my mind, I think I would need to see some concrete advantages of this approach:
|
I mean in general python's eventloop will be fairly expensive to rust regardless as you have no way of properly yielding via rust, however you can likely have a simple set of std io tools and just use the low level asyncio handles, like i did with Pyre that uses asyncio to move the state of the server forward using a std Tcp listener. Anything like asyncio.sleep however is likely going to be conpletely out of the question in a sane way, or atleast not without simply wrapping the pyobject. |
I saw the comments about performance in the warnings in the https://github.com/ChillFish8/Async-PyO3-Examples, but I wasn't sure if they still applied. AFAIK we haven't done a comparison of the performance of
I think @ThibaultLemaire mentioned he had a solution to this, unless I'm misunderstanding:
Is that referring to the same problem @ChillFish8? |
You know i have no idea, I am starting to get completely lost down the rabbit hole of this conversation 😅 Sorry for the delayed reply btw i have been a tad busy lately. |
Just some progress report: I believe I have finally found a satisfying* solution to avoid any deadlock with my code. (Link to the commit) I have fully documented my approach here. (Although the double callback pattern that I'm using to work around deadlocks isn't needed for those examples and the explanations I wrote are somewhat misleading. So I'll probably rephrase or remove them entirely.) So I think I'm ready to move on to either writing a proper crate or forking PyO3. *i.e. that is not a dirty workaround |
Awesome! Before deciding on any approach to merge into PyO3 core I'd like to fully understand API consequences and performance of each approach. It may be easier to have that discussion if your implementation starts as a separate crate (or additional functionality added to |
I have started Although, I've been thinking: I liked my approach because it felt cleaner, more direct, but I've been unable to put my finger on why exactly, and I still don't have many practical advantages to it. I mean, let's take my test case of mongodb. With the So... is this all even worth the effort? *or the mongodb driver, I actually don't know which is responsible for this behaviour. |
It's possible that the mongodb driver is working in sterling because of how the I would love to drop the runtime-specific stuff in this crate, but unfortunately I don't think that's possible with the current async/await design. Rust has one of the best and most flexible async/await designs out there, but writing runtime-agnostic libraries is still a problem. Here's an interesting article about a crate with a potential solution.
I think your approach makes better use of the |
Thank you for the interest, I shall keep working on my end then. And thank you also for the interesting read, I am now convinced by the concept of "nurseries" (although not so much by Trio's implementation: There is still that ugly runtime sandwich that makes it a pain for us to port foreign async libraries to Python). An other idea I had for flexibility of runtime -- while I'm at it -- is "composable runtimes" : What if, for example, in addition and as an alternative to
No, you're right, it doesn't. In fact, as
Speaking about the trait, I added it just to see if it looked more ergonomic and to challenge the orphan rule, but once more, I was the lesser wizard, and rustc won. I wanted to implement #[pyfunction]
fn my_awaitable() -> impl Future<u8> + Send + 'static {
async { 42 }
} As for the ergonomics of my trait, I'm not so convinced, and I'm not really using it as a trait, it's just an extension method for now. So I'm not sold on keeping it yet. |
I think the concept of nurseries is almost indistinguishable from Rust's runtimes. I see I want something like
I believe this is already possible using // untested, but should work even though the main fn is running on tokio and sleep task is running on async_std
#[tokio::main]
async main() {
let async_std_task = async_std::task::spawn(async move {
async_std::task::sleep(std::time::Duration::from_secs(1)).await;
});
async_std_task.await;
} The |
Yes I'm aware of that, but you need the async_std runtime to run somewhere, right? In this case it's in another thread which is okay, but what if you wanted to have the two runtimes share the same thread? With the current tools, they would simply deadlock, but if you could run a runtime itself asynchronously, then you could achieve single-threaded multi-runtime. Which would be especially interesting in our Python context. |
I don't think single-threaded multi-runtime is possible since runtimes assume complete control over their threads (the task scheduler is a synchronous loop). An async runtime would have to run on some other runtime since async functions can't run by themselves, so I think that would kinda defeat the purpose. You could potentially have a unified Python/Rust runtime either with your current approach or with Python bindings to run async Python on top of |
Well, all I'm doing is speculating at this point, but I'm not convinced that would be impossible, nor useless. We'll see in the far future if I get to implementing some proof of concept. Also, random thought on structured concurrency that I'm jotting down here because I don't know where else: What makes the Rust async model so different is its use of a But if you have ever worked with the Future trait directly you have noticed that it doesn't exactly take a What if this EDIT: Someone else thought of extending the EDIT 2: *We're talking spawned child task cancellation here. |
I believe many async models support this kind of thing. Wakers are a pretty fundamental building block for async. Without them you're just left constantly polling the There are a few things that make Rust's async model unique
The
I could be wrong on this, but I believe reactor, scheduler, executor, runtime, event loop, etc. are all referring to the same thing.
I think the thing that prevents this from happening is that objects like
In the task-scope API docs the author talks about the motivation behind task-scope. I was confused because |
Yes, thank you for rectifying, I was completely confused. Callback based wake up is definitely not specific to Rust, and "runtimes" have many names. (The fact that
Ah you're right, although I reckon that might be circumvented with type erasure, much like how a
Correct, I have edited (again) my previous comment.
I was just hopping through the links of the blog post you previously linked and happened upon tokio-rs/tokio#1879 (comment). |
I was thinking about it a bit and I think I was wrong. You might be able to do something like this: trait Spawn {
fn spawn(&mut self, task: Box<dyn Future<Output = ()>>);
}
impl Context {
pub fn spawn<F, T>(fut: F) -> JoinHandle<T> where F: Future<Output = T> + 'static {
let (join_tx, join_hdl) = JoinHandle::channel();
self.spawner.spawn(async move {
let output = fut.await;
join_tx.send(output);
});
join_hdl
}
} Since It seems doable, but idk what the performance/runtime implications are for it since I've never spelunked that deep into the code of an executor like |
Some progress report:
That last one is a pretty interesting challenge as I'd like to benchmark PS: I also stole your use of |
So I have dealt with streams a little bit, but I haven't thought much about moving those utilities into Rust to Python - use Unbounded channels are nicer for the sending side since they don't require conversions in/out of coroutines, but if you need a bounded channel (it's usually a better idea), The main reason I haven't thought too much about adding these utilities yet is because there wasn't really a demand for it at the time and there are a lot of different choices for the underlying channel. I'm not sure if the ones I've worked with are ideal.
I stole that from ChillFish8 I think, lol. |
Yet another one of my spurges of testing Pyre paying off lol |
On second thought a import asyncio
import myrustmodule
async def foo():
await myrustmodule.async_fn() # Even if the Rust Future goes straight to Ready without blocking, this is equivalent to
# await asyncio.create_task(myrustmodule.async_fn())
# So it has to go through 2 extra iterations of the loop before getting the result
# (First you register the Rust task to be called on the next iteration of the loop,
# then, when it's done it registers its callbacks to be called on the next iteration,
# waking up the parent Task which can finally get the result) But without it So, in fact, I should rather be implementing a @davidhewitt would you happen to have any pointers on how I could trick I know I'm already pretty deep down Python's internals with my custom Future, but monkeypatching is maybe just one step too far for my comfort. |
You can trick |
TL;DR the instance's code flags need to contain the all ensure_future is doing is essentially extracting the await object and modifiying it's flags: co_flags = func.__code__.co_flags
# Check if 'func' is a coroutine function.
# (0x180 == CO_COROUTINE | CO_ITERABLE_COROUTINE)
if co_flags & 0x180:
return func
# Check if 'func' is a generator function.
# (0x20 == CO_GENERATOR)
if co_flags & 0x20:
# TODO: Implement this in C.
co = func.__code__
# 0x100 == CO_ITERABLE_COROUTINE
func.__code__ = co.replace(co_flags=co.co_flags | 0x100)
return func``` |
The issue is
So, even if I could find a way to trick So I basically have two choices:
I'd rather go with 1. since 2. means coordinating at least two owners for the Is there really no way to inherit a Python type? Or even just fake enough of it that my type gets picked up as a subtype by |
It absolutely should be possible, but the problem is that there's weaknesses in the current PyO3 core which mean it can't (yet) be done with PyO3. I started playing around with an experimental repository at davidhewitt/pytypes, but think that I need to upstream fixes into PyO3 before it's likely to be useful. If you watch that repository then you'll get a notification when I put out the first release, or otherwise if you want to help push this forward feel free to start some discussion on that repo. At the moment I'm preparing documentation and a few other refinements with a view to releasing PyO3 0.14 soon, so I probably won't be trying to solve it myself for a month or so. |
Cool! I'll stick to my monkey patching for the time being then, and switch to inheritance once |
Actually, I've just implemented the And it was surprisingly easy. I was afraid I've got support for |
With the recent discussion on performance, I just wanted to jot down a note here: We probably want to stay off the Python thread as much as possible This is coming from a short exchange I had a while ago with @ajdavis (the original author of While I have yet to finish reading all the articles he kindly shared with me on the topic, all the clues I have so far point to that same conclusion. So, because The only question remaining is if we could swap using channels for my Rustines and if they would run any faster. |
generally with my experience building Pyre with asyncio, is keeping to the main thread as much as possible. I ran an experimental setup that than the server in a seperate thread that than called back on the main thread and even though the code exectution / server speed was about 25% faster or so the overhead added by calling from other threads cut performance overall down to about 20% of what it was / currently is |
There is a really fine line getting performance between the two languages in a async context and generally it involves calling Pythonas little as possible. The less you call Python and touch the asyncio event loop or coroutines at all the faster the setup. |
Sorry folks, just a heads up that I will not be working on this any further for the foreseeable future. If anyone wants to pick up where I left though I'll be happy to help (all my code is Apache 2.0, but it may be lacking proper documentation, so I'm always available for questions). |
Hi, so, following PyO3/pyo3#701 I've been taking a look at what you did here, and, first of all, I wanted to congratulate you on the effort you put into it and what you achieved here. Using channels for cross-runtime communication is a pretty good and simple idea.
On my end, I have been focusing entirely on executing async Rust code from (async) Python, which didn't give me the same perspective on the problem and greatly simplified my assumptions (but maybe not my implementation, we'll see).
So I ended up with a different solution: Wrapping the Rust Future into a Python Future-like object.
(Yes, Python - or more accurately, asyncio - confusingly also has a concept of Future, which is an awaitable but not a coroutine and is actually the low-level handle to thread results, for example. But you already know that since you use
loop.create_future
to communicate the result back to Python.)What this lets me do is run async Rust code (albeit trivial for now) without a runtime. Or to be more accurate, to use asyncio as the runtime to drive Rust futures.
I believe this approach to be more flexible as a library writer could in turn spin up their runtime of choice (tokio, async-std, or even both...), spawn tasks into it, and directly return the JoinHandle to be awaited by Python. In the end I guess it would be equivalent to what you do here, except without global state.
I'm not sure I'm being very clear, so before I ask you if this approach would make sense for what you're doing or for PyO3, I think I'll just fork Async-PyO3-Examples and explain the reasoning step by step.
As a foretaste and to illustrate, I'm thinking of being able to do something like this (untested):
For reference you can take a look at my playgroud where I've been prototyping, but I apologise in advance for the mess that it is, and I wouldn't mind if you preferred to wait until I'm done writing some cleaner examples.
@awestlake87 @davidhewitt @ChillFish8 tagging you guys for the discussion
The text was updated successfully, but these errors were encountered: