Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uasyncio fast_io EventLoop option reduces I/O scheduling latency #287

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

uasyncio fast_io EventLoop option reduces I/O scheduling latency #287

wants to merge 1 commit into from

Conversation

peterhinch
Copy link
Contributor

@peterhinch peterhinch commented Jun 20, 2018

This PR offers a solution to the issue discussed in MicroPython issue 2664 and provides a way of accessing the full capability of MicroPython PR 3836; namely the ability to support user-written stream I/O drivers with low latency.

An optional ioq_len=0 arg to get_event_loop() is added. The default behaviour is unchanged from standard: if I/O is pending in the presence of coros on the run queue, the I/O task is appended to the run queue and is scheduled after all pending tasks have yielded.

If ioq_len is > 0, pending I/O tasks will be scheduled in preference to tasks on the run queue, substantially reducing latency. This reduces the need for buffering of fast I/O devices and improves the response time of user-written I/O drivers.

A test program demonstrating this effect may be found here.

@peterhinch
Copy link
Contributor Author

peterhinch commented Jun 25, 2018

To add to this the I/O latency of the current code is twice as long as might be expected. Assume there are N coros which issue await asyncio.sleep(0) and one StreamReader. The StreamReader gets to run only after the N coros have run twice. This may be seen with the test program referenced above (run test() against the current library).

The reason for this is as follows. Assume user coros as per the test script like

    async def dummy(self):
        while True:
            await asyncio.sleep(0)
            utime.sleep_ms(10)  # Emulate time consuming user code

Assume an initial condition where the run queue contains N such coros. Each will be scheduled before the scheduler's .wait method gives I/O a chance to poll the device. But when each coro yields, it appends itself to the run queue. So, when .wait gets to run, .runq already contains the N coros awaiting scheduling. When .wait runs and finds the device ready it appends the I/O read task to the queue.

So in the next iteration through the run queue the N coros run a second time before the I/O read is re-scheduled. At the time when this takes place each coro has yielded and been put on the run queue. So after the I/O read has been scheduled the run queue contains the N coros recreating the assumed initial condition.

This PR does not alter this default behaviour. But when fast_io is specified I/O tasks alternate with other tasks on the run queue, reducing latency by a factor of 2N.

@dpgeorge
Copy link
Member

From my understanding there are two main things going on here: 1) implementing priorities for coros (low and high); 2) pumping the IO poller faster to give high-priority IO a chance to schedule itself faster.

For part 1): would it be simpler to just push the callback to the head of the runq, rather than having a separate queue?

Also, it seems the way to specify that the coro has priority here is to make it yield IOReadDone/IOWriteDone (via the appropriate StreamReader/StreamWriter class). Might there not be a more general way to specify that a coro is higher priority, like registering itself as high-prio with the event loop?

@peterhinch
Copy link
Contributor Author

My initial plan was indeed to push the coro to the head of runq but ucollections.deque doesn't have an appendleft method; I was wary of the performance implications of implementing one. The penalty of an extra queue seems small, especially as it's only instantiated if required. But there is another issue.

The inner loop runs only those tasks which were on runq at the start (tasks are appended as the loop runs). If .wait(0) added an arbitrary number of tasks to the top (left) of runq keeping track of the tasks (and args) which need to be popped off could get involved: you don't know if args are queued until you've established the type of the object popped. So you can't just compare the length of the queue before and after .wait(0). I concluded an I/O queue was a simpler (and possibly faster) solution.

The forthcoming fix for the read/write class bug means that the StreamReader and StreamWriter bound coroutines must always yield IOReadDone or IOWriteDone. This is unrelated to priority: it is for the correct management of the poll flags for read/write devices which is done in Python. The code can be seen in this branch which combines this PR with the forthcoming proposal to fix the read/write bug. Yielding a "done" instance triggers the ._unregister() method which handles the poll flags.

I/O coros register themselves as high priority by ._call_io which places them on ioq or runq depending on whether an ioq has been instantiated.

I'm unsure what you're suggesting regarding a more general solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants