asyncio with improved I/O performance #18816

peterhinch · 2026-02-14T18:05:42Z

peterhinch
Feb 14, 2026
Collaborator

This proposes a minor (2 LOC) change to asyncio.core.py to give I/O tasks higher priority.

Currently if an I/O event occurs - such as a ThreadSafeFlag being set - the task waiting on the TSF is queued behind other pending tasks. If there are ten tasks each of which blocks for 5ms before yielding, latency is 50ms. This change causes the waiting task to be placed at the head of the queue, reducing TSF latency (to <=5ms in this example). In applications using fast I/O there is potential for using smaller buffers.

The modified version passes the test suite with the exception of asyncio_threadsafeflag: in this case the sequence of output print statements is changed as a result of the reduced latency.

When an I/O task is assigned to the task queue it is now assigned a time-to-run which is overdue by 1s. From core.py, the two changed lines being commented as overdue:

    def wait_io_event(self, dt):
        for s, ev in self.poller.ipoll(dt):
            sm = self.map[id(s)]
            # print('poll', s, sm, ev)
            if ev & ~select.POLLOUT and sm[0] is not None:
                # POLLIN or error
                _task_queue.push(sm[0], ticks_add(ticks(), -1000))  # Overdue task
                sm[0] = None
            if ev & ~select.POLLIN and sm[1] is not None:
                # POLLOUT or error
                _task_queue.push(sm[1], ticks_add(ticks(), -1000))  # Overdue task
                sm[1] = None
            if sm[0] is None and sm[1] is None:
                self._dequeue(s)
            elif sm[0] is None:
                self.poller.modify(s, select.POLLOUT)
            else:
                self.poller.modify(s, select.POLLIN)

This ensures that I/O tasks retain appropriate relative order, while being scheduled ahead of normal due and pending tasks.

Comments welcome!

peterhinch · 2026-02-14T18:12:02Z

peterhinch
Feb 14, 2026
Collaborator Author

Test script which shows the impact of the above:

import asyncio_fast as asyncio
# import asyncio
import time
import io

MP_STREAM_POLL_RD = const(1)
MP_STREAM_POLL = const(3)
MP_STREAM_ERROR = const(-1)


class MillisecTimer(io.IOBase):  # Timer using I/O mechanism
    def __init__(self):
        self.end = 0
        self.sreader = asyncio.StreamReader(self)

    def __iter__(self):
        await self.sreader.read(1)

    def __call__(self, ms):
        self.end = time.ticks_add(time.ticks_ms(), ms)
        return self

    def read(self, _):
        return b"a"

    def ioctl(self, req, arg):
        ret = MP_STREAM_ERROR
        if req == MP_STREAM_POLL:
            ret = 0
            if arg & MP_STREAM_POLL_RD:
                if time.ticks_diff(time.ticks_ms(), self.end) >= 0:
                    ret |= MP_STREAM_POLL_RD
        return ret


async def block(n):  # Blocking task
    while True:
        time.sleep_ms(5)
        await asyncio.sleep_ms(0)


async def timer_test(m):
    timer = MillisecTimer()
    tasks = []
    for n in range(10):
        tasks.append(asyncio.create_task(block(n)))
    for x in range(m):
        t = time.ticks_ms()
        await timer(100)  # Pause, measure the actual duration
        print(x, time.ticks_diff(time.ticks_ms(), t))


asyncio.run(timer_test(20))

On a Pyboard 1.1 with standard asyncio the 100ms wait measures 150ms. With the modified version, it measures 100ms. Similar results on RP2350 (150ms/102ms).

0 replies

AJMansfield · 2026-02-15T17:10:49Z

AJMansfield
Feb 15, 2026
Collaborator

I've run into the exact latency issue this solves multiple times, and I think it's important to solve this. In principle, async is an excellent abstraction for dealing with the state-management challenges that come with interrupt-driven program flow, but MicroPython's current ThreadSafeFlag isn't even close to satisfactory on performace.

As a specific example where it wasn't adequate, I had a situation once with several different chips that required a firmware blob upload over I2C following their own snowflake upload logic. With multiplexing, this task should have been able to fully saturate both I2C busses on the rp2040 chip in question, but with the tens of milliseconds it always took from each 'I2C complete' IRQ to the subsequent resume-from-await on the corresponding flag, I was only ever able to reach around 40% bus utilization. The advantage of async here in being able to implement the datasheet upload routines exactly as written without manual interleaving was still useful --- but it's a seriously hamstrung feature at present.

I am cautious about the risk of changing existing behavior, though, in a way that could lead to task starvation in existing codebases. As far as ways of breaking old code go, that's definitely one of the worst ways.

As a way to add this in a non-breaking way, what about the already-standardized POLLPRI eventmask bit that already basically means "this is a high priority IRQ"? (It's not currently extant in MicroPython's modselect, but it's part of the CPython select and part of the posix select API.)

Perhaps an implementation more like this?

    def wait_io_event(self, dt):
        for s, ev in self.poller.ipoll(dt):
            sm = self.map[id(s)]
            # print('poll', s, sm, ev)
            due = ticks()
            if ev & select.POLLPRI:
                ev &= ~select.POLLPRI
                due = ticks_add(due, -1000)
            if ev & ~select.POLLOUT and sm[0] is not None:
                # POLLIN or error
                _task_queue.push(sm[0], due)
                sm[0] = None
            if ev & ~select.POLLIN and sm[1] is not None:
                # POLLOUT or error
                _task_queue.push(sm[1], due)
                sm[1] = None
            if sm[0] is None and sm[1] is None:
                self._dequeue(s)
            elif sm[0] is None:
                self.poller.modify(s, select.POLLOUT)
            else:
                self.poller.modify(s, select.POLLIN)

Would also need to update how the IOQueue sets the eventmask when registering; could potentially even extend to add a task_waiting_prio entry to IOQueue's map.

4 replies

peterhinch Feb 16, 2026
Collaborator Author

I'm not sure how a asyncio application could concurrently saturate two I2C buses. I2C operations block, so while one bus is running the scheduler is stalled. The theoretical maximum would be 50%. Unless I'm missing something.

AJMansfield Feb 16, 2026
Collaborator

I2C operations block

Not the custom async I2C implementation I wrote for this. (This was using RP2 DMA to set up transactions to run autonomously and fire an interrupt when complete; the problem was just the latency involved in bridging from that interrupt back to the async code that initiated the transaction.)

peterhinch Feb 16, 2026
Collaborator Author

Ah, OK!

Task starvation is clearly a hazard with any system which permits high priority scheduling. Consider a system with hard IRQ's occurring at 50Hz, with the ISR triggering a ThreadSafeFlag. If the task waiting on the TSF blocks for 20ms, under my scheme no other task will get to run. With standard asyncio all tasks will run. However, after a while the task waiting on the TSF will not be scheduled to run until two interrupts have occurred. In both cases a pathological system produces pathological behaviour...

The starvation objection is a reason I didn't offer this as a PR. However I am considering issuing a custom version of asyncio with I/O priority plus a feature offering lower power operation on suitable platforms. The docs will have appropriate caveats, and the target audience would be those with special requirements.

AJMansfield Feb 16, 2026
Collaborator

I've created a draft PR to show the concept I have how to handle it, with a separate queue_pri in IOQueue to preserve existing behavior and allow granular await-by-await decisions about whether priority is warranted. Still a foot-gun, of course, but at least you'd have to mean to shoot it.

Add priority-wait function to ThreadSafeFlag. #18826

AJMansfield · 2026-02-16T18:34:24Z

AJMansfield
Feb 16, 2026
Collaborator

Just submitted a PR for another possible way to improve async latency:

extmod/asyncio: Add async IRQ context manager. #18828

0 replies

peterhinch · 2026-02-20T17:50:06Z

peterhinch
Feb 20, 2026
Collaborator Author

I have added asyncio_alt to the asyncio repo.

This offers fast I/O scheduling and low power operation. It is mip-installable. Fast I/O needs some attention to detail in application design.

The low power option is for experimenters and requires careful application design for best results. It is platform dependent and is incompatible with STM32. Power savings depend on the performance of machine.lightsleep. It works well on RP2, where with care it's possible to produce an application with 1.5mA power draw. An order of magnitude reduction on standard.

0 replies

MicroPython

asyncio with improved I/O performance #18816

Uh oh!

Uh oh!

peterhinch Feb 14, 2026 Collaborator

Replies: 4 comments · 4 replies

Uh oh!

Uh oh!

peterhinch Feb 14, 2026 Collaborator Author

Uh oh!

Uh oh!

AJMansfield Feb 15, 2026 Collaborator

Uh oh!

peterhinch Feb 16, 2026 Collaborator Author

Uh oh!

Uh oh!

AJMansfield Feb 16, 2026 Collaborator

Uh oh!

peterhinch Feb 16, 2026 Collaborator Author

Uh oh!

Uh oh!

AJMansfield Feb 16, 2026 Collaborator

Uh oh!

AJMansfield Feb 16, 2026 Collaborator

Uh oh!

peterhinch Feb 20, 2026 Collaborator Author

peterhinch
Feb 14, 2026
Collaborator

Replies: 4 comments 4 replies

peterhinch
Feb 14, 2026
Collaborator Author

AJMansfield
Feb 15, 2026
Collaborator

peterhinch Feb 16, 2026
Collaborator Author

AJMansfield Feb 16, 2026
Collaborator

peterhinch Feb 16, 2026
Collaborator Author

AJMansfield Feb 16, 2026
Collaborator

AJMansfield
Feb 16, 2026
Collaborator

peterhinch
Feb 20, 2026
Collaborator Author