This version is a "drop in" replacement for official uasyncio
. Existing
applications should run under it unchanged and with essentially identical
performance.
This version has the following features:
- I/O can optionally be handled at a higher priority than other coroutines PR287.
- Tasks can yield with low priority, running when nothing else is pending.
- Callbacks can similarly be scheduled with low priority.
- A bug whereby bidirectional devices such as UARTS can fail to handle concurrent input and output is fixed.
- It is compatible with
rtc_time.py
for micro-power applications documented here. - An assertion failure is produced if
create_task
orrun_until_complete
is called with a generator function PR292. This traps a common coding error which otherwise results in silent failure. - The presence of the
fast_io
version can be tested at runtime. - The presence of an event loop instance can be tested at runtime.
run_until_complete(coro())
now returns the value returned bycoro()
as per CPython micropython-lib PR270.
Note that priority device drivers are written by using the officially supported
technique for writing stream I/O drivers. Code using such drivers will run
unchanged under the fast_io
version. Using the fast I/O mechanism requires
adding just one line of code. This implies that if official uasyncio
acquires
a means of prioritising I/O other than that in this version, application code
changes should be minimal.
The high priority mechanism formerly provided in asyncio_priority.py
was a
workround based on the view that stream I/O written in Python would remain
unsupported. This is now available so asyncio_priority.py
is obsolete and
should be deleted from your system. The facility for low priority coros
formerly provided by asyncio_priority.py
is now implemented.
- Installation
1.1 Benchmarks Benchmark and demo programs. - Rationale
2.1 Latency
2.2 Timing accuracy
2.3 Polling in uasyncio - The modified version
3.1 Fast IO
3.2 Low Priority
3.3 Other Features
3.4 Low priority yield
3.4.1 Task Cancellation and Timeouts
3.5 Low priority callbacks - ESP Platforms
- Background
- Performance
Install and test uasyncio on the target hardware. Replace core.py
and
__init__.py
with the files in the fast_io
directory.
In MicroPython 1.9 uasyncio
was implemented as a frozen module on the
ESP8266. To install this version it is necessary to build the firmware with the
above two files implemented as frozen bytecode. See
ESP Platforms for general comments on the
suitability of ESP platforms for systems requiring fast response.
It is possible to load modules in the filesystem in preference to frozen ones
by modifying sys.path
. However the ESP8266 probably has too little RAM for
this to be useful.
The benchmarks directory contains files demonstrating the performance gains offered by prioritisation. They also offer illustrations of the use of these features. Documentation is in the code.
benchmarks/latency.py
Shows the effect on latency with and without low priority usage.benchmarks/rate.py
Shows the frequency with which uasyncio schedules minimal coroutines (coros).benchmarks/rate_esp.py
As above for ESP32 and ESP8266.benchmarks/rate_fastio.py
Measures the rate at which coros can be scheduled if the fast I/O mechanism is used but no I/O is pending.benchmarks/call_lp.py
Demos low priority callbacks.benchmarks/overdue.py
Demo of maximum overdue feature.benchmarks/priority_test.py
Cancellation of low priority coros.fast_io/ms_timer.py
Provides higher precision timing thanwait_ms()
.fast_io/ms_timer_test.py
Test/demo program for above.fast_io/pin_cb.py
Demo of an I/O device driver which causes a pin state change to trigger a callback.fast_io/pin_cb_test.py
Demo of above.
With the exceptions of call_lp
, priority
and rate_fastio
, benchmarks can
be run against the official and priority versions of usayncio.
MicroPython firmware now enables device drivers for stream devices to be
written in Python, via uio.IOBase
. This mechanism can be applied to any
situation where a piece of hardware or an asynchronously set flag needs to be
polled. Such polling is efficient because it is handled in C using
select.poll
, and because the coroutine accessing the device is descheduled
until polling succeeds.
Unfortunately official uasyncio
polls I/O with a relatively high degree of
latency.
Applications may need to poll a hardware device or a flag set by an interrupt service routine (ISR). An overrun may occur if the scheduling of the polling coroutine (coro) is subject to excessive latency. Fast devices with interrupt driven drivers (such as the UART) need to buffer incoming data during any latency period. Lower latency reduces the buffer size requirement.
Further, a coro issuing await asyncio.sleep_ms(t)
may block for much longer
than t
depending on the number and design of other coros which are pending
execution. Delays can easily exceed the nominal value by an order of magnitude.
This variant mitigates this by providing a means of scheduling I/O at a higher priority than other coros: if an I/O queue is specified, I/O devices are polled on every iteration of the scheduler. This enables faster response to real time events and also enables higher precision millisecond-level delays to be realised.
The variant also enables coros to yield control in a way which prevents them from competing with coros which are ready for execution. Coros which have yielded in a low priority fashion will not be scheduled until all "normal" coros are waiting on a nonzero timeout. The benchmarks show that the improvement in the accuracy of time delays can exceed two orders of magnitude.
Coroutines in uasyncio which are pending execution are scheduled in a "fair" round-robin fashion. Consider these functions:
async def foo():
while True:
yield
# code which takes 4ms to complete
async def handle_isr():
global isr_has_run
while True:
if isr_has_run:
# read and process data
isr_has_run = False
yield
Assume a hardware interrupt handler sets the isr_has_run
flag, and that we
have ten instances of foo()
and one instance of handle_isr()
. When
handle_isr()
issues yield
, its execution will pause for 40ms while each
instance of foo()
is scheduled and performs one iteration. This may be
unacceptable: it may be necessary to poll and respond to the flag at a rate
sufficient to avoid overruns.
In this version handle_isr()
would be rewritten as a stream device driver
which could be expected to run with latency of just over 4ms.
Alternatively this latency may be reduced by enabling the foo()
instances to
yield in a low priority manner. In the case where all coros other than
handle_isr()
are low priority the latency is reduced to 300μs - a figure
of about double the inherent latency of uasyncio.
The benchmark latency.py demonstrates this. Documentation is in the code; it can be run against both official and priority versions. This measures scheduler latency. Maximum application latency, measured relative to the incidence of an asynchronous event, will be 300μs plus the worst-case delay between yields of any one competing task.
The official version of uasyncio
has even higher levels of latency for I/O
scheduling. In the above case of ten coros using 4ms of CPU time between zero
delay yields, the latency of an I/O driver would be 80ms.
Consider these functions:
async def foo():
while True:
await asyncio.sleep(0)
# code which takes 4ms to complete
async def fast():
while True:
# Code omitted
await asyncio.sleep_ms(15)
# Code omitted
Again assume ten instances of foo()
and one of fast()
. When fast()
issues await asyncio.sleep_ms(15)
it will not see a 15ms delay. During the
15ms period foo()
instances will be scheduled. When the delay elapses,
fast()
will compete with pending foo()
instances.
This results in variable delays up to 55ms (10 tasks * 4ms + 15ms). A
MillisecTimer
class is provided which uses stream I/O to achieve a relatively
high precision delay:
async def timer_test(n):
timer = ms_timer.MillisecTimer()
while True:
await timer(30) # More precise timing
# Code
The test program fast_io/ms_timer_test.py
illustrates three instances of a
coro with a 30ms nominal timer delay, competing with ten coros which yield with
a zero delay between hogging the CPU for 10ms. Using normal scheduling the 30ms
delay is actually 300ms. With fast I/O it is 30-34ms.
The asyncio library provides various mechanisms for polling a device or flag.
Aside from a polling loop these include awaitable classes and asynchronous
iterators. If an awaitable class's __iter__()
method simply returns the state
of a piece of hardware, there is no performance gain over a simple polling
loop.
This is because uasyncio schedules tasks which yield with a zero delay, together with tasks which have become ready to run, in a "fair" round-robin fashion. This means that a task waiting on a zero delay will be rescheduled only after the scheduling of all other such tasks (including timed waits whose time has elapsed).
The fast_io
version enables awaitable classes and asynchronous iterators to
run with lower latency by designing them to use the stream I/O mechanism. The
program fast_io/ms_timer.py
provides an example.
Practical cases exist where the foo()
tasks are not time-critical: in such
cases the performance of time critical tasks may be enhanced by enabling
foo()
to submit for rescheduling in a way which does not compete with tasks
requiring a fast response. In essence "slow" operations tolerate longer latency
and longer time delays so that fast operations meet their performance targets.
Examples are:
- User interface code. A system with ten pushbuttons might have a coro running on each. A GUI touch detector coro needs to check a touch against sequence of objects. Both may tolerate 100ms of latency before users notice any lag.
- Networking code: a latency of 100ms may be dwarfed by that of the network.
- Mathematical code: there are cases where time consuming calculations may take place which are tolerant of delays. Examples are statistical analysis, sensor fusion and astronomical calculations.
- Data logging.
The fast_io
version adds ioq_len=0
and lp_len=0
arguments to
get_event_loop
. These determine the lengths of I/O and low priority queues.
The zero defaults cause the queues not to be instantiated, in which case the
scheduler operates as per the official version. If an I/O queue length > 0 is
provided, I/O performed by StreamReader
and StreamWriter
objects is
prioritised over other coros. If a low priority queue length > 0 is specified,
tasks have an option to yield in such a way to minimise their competition with
other tasks.
Arguments to get_event_loop()
:
runq_len=16
Length of normal queue. Default 16 tasks.waitq_len=16
Length of wait queue.ioq_len=0
Length of I/O queue. Default: no queue is created.lp_len=0
Length of low priority queue. Default: no queue.
Device drivers which are to be capable of running at high priority should be written to use stream I/O: see Writing streaming device drivers.
The fast_io
version will schedule I/O whenever the ioctl
reports a ready
status. This implies that devices which become ready very soon after being
serviced can hog execution. This is analogous to the case where an interrupt
service routine is called at an excessive frequency.
This behaviour may be desired where short bursts of fast data are handled. Otherwise drivers of such hardware should be designed to avoid hogging, using techniques like buffering or timing.
The low priority solution is based on the notion of "after" implying a time
delay which can be expected to be less precise than the asyncio standard calls.
The fast_io
version adds the following awaitable instances:
after(t)
Low priority version ofsleep(t)
.after_ms(t)
Low priority version ofsleep_ms(t)
.
It adds the following event loop methods:
loop.call_after(t, callback, *args)
loop.call_after_ms(t, callback, *args)
loop.max_overdue_ms(t=None)
This sets the maximum time a low priority task will wait before being scheduled. A value of 0 corresponds to no limit. The default argNone
leaves the period unchanged. Always returns the period value. If there is no limit and a competing task runs a loop with a zero delay yield, the low priority yield will be postponed indefinitely.
Variable:
version
Contains 'fast_io'. Enables the presence of this version to be determined at runtime.
Function:
got_event_loop()
No arg. Returns abool
:True
if the event loop has been instantiated. Enables code using the event loop to raise an exception if the event loop was not instantiated:
class Foo():
def __init__(self):
if asyncio.got_event_loop():
loop = asyncio.get_event_loop()
loop.create_task(self._run())
else:
raise OSError('Foo class requires an event loop instance')
This avoids subtle errors:
import uasyncio as asyncio
bar = Bar() # Constructor calls get_event_loop()
# and renders these args inoperative
loop = asyncio.get_event_loop(runq_len=40, waitq_len=40)
Consider this code fragment:
import uasyncio as asyncio
loop = asyncio.get_event_loop(lp_len=16)
async def foo():
while True:
# Do something
await asyncio.after(1.5) # Wait a minimum of 1.5s
# code
await asyncio.after_ms(20) # Wait a minimum of 20ms
These await
statements cause the coro to suspend execution for the minimum
time specified. Low priority coros run in a mutually "fair" round-robin fashion.
By default the coro will only be rescheduled when all "normal" coros are waiting
on a nonzero time delay. A "normal" coro is one that has yielded by any other
means.
This behaviour can be overridden to limit the degree to which they can become overdue. For the reasoning behind this consider this code:
import uasyncio as asyncio
loop = asyncio.get_event_loop(lp_len=16)
async def foo():
while True:
# Do something
await asyncio.after(0)
By default a coro yielding in this way will be re-scheduled only when there are no "normal" coros ready for execution i.e. when all are waiting on a nonzero delay. The implication of having this degree of control is that if a coro issues:
while True:
await asyncio.sleep(0)
# Do something which does not yield to the scheduler
low priority tasks will never be executed. Normal coros must sometimes wait on a non-zero delay to enable the low priority ones to be scheduled. This is analogous to running an infinite loop without yielding.
This behaviour can be modified by issuing:
loop = asyncio.get_event_loop(lp_len = 16)
loop.max_overdue_ms(1000)
In this instance a task which has yielded in a low priority manner will be rescheduled in the presence of pending "normal" tasks if they cause a low priority task to become overdue by more than 1s.
Tasks which yield in a low priority manner may be subject to timeouts or be cancelled in the same way as normal tasks. See Task cancellation and Coroutines with timeouts.
The following EventLoop
methods enable callback functions to be scheduled
to run when all normal coros are waiting on a delay or when max_overdue_ms
has elapsed:
call_after(delay, callback, *args)
Schedule a callback with low priority.
Positional args:
delay
Minimum delay in seconds. May be a float or integer.callback
The callback to run.*args
Optional comma-separated positional args for the callback.
The delay specifies a minimum period before the callback will run and may have a value of 0. The period may be extended depending on other high and low priority tasks which are pending execution.
A simple demo of this is benchmarks/call_lp.py
. Documentation is in the
code.
call_after_ms(delay, callback, *args)
Call with low priority. Positional
args:
delay
Integer. Minimum delay in millisecs before callback runs.callback
The callback to run.*args
Optional positional args for the callback.
It should be noted that the response of the ESP8266 to hardware interrupts is remarkably slow. This also appears to apply to ESP32 platforms. Consider whether a response in the high hundreds of μs meets project requirements; also whether a priority mechanism is needed on hardware with such poor realtime performance.
This has been discussed in detail in issue 2989.
A further discussion on the subject of using the ioread mechanism to achieve fast scheduling took place in issue 2664.
Support was finally added here.
This version is designed to enable existing applications to run without change to code and to minimise the effect on raw scheduler performance in the case where the added functionality is unused.
The benchmark rate.py
measures the rate at which tasks can be scheduled. It
was run (on a Pyboard V1.1) under official uasyncio
V2, then under this
version. The benchmark rate_fastio
is identical except it instantiates an I/O
queue and a low priority queue. Results were as follows.
Script | Uasyncio version | Period (100 coros) | Overhead |
---|---|---|---|
rate | Official V2 | 156μs | 0% |
rate | fast_io | 162μs | 3.4% |
rate_fastio | fast_io | 206μs | 32% |
If an I/O queue is instantiated I/O is polled on every scheduler iteration
(that is its purpose). Consequently there is a significant overhead. In
practice the overhead will increase with the number of I/O devices being
polled and will be determined by the efficiency of their ioctl
methods.