optimization for _spi_write_many #1

rolandFleddermann · 2020-12-05T12:24:34Z

In the spirit of improving the data transmission rate through combining the DC bit with multiple writes (https://forum.micropython.org/viewtopic.php?t=6824#p38843), I figured it might be worth further optimizing this part of the code. For me, the frame rate on an ESP8266 (@ 80MHz) went up from about 10 FPS to about 32 FPS or (assuming the minimal line drawing code between frames is not adding significant delays) 100 ms per full buffer write to about 30 ms per buffer write.
Somewhat frustratingly this is still nowhere near the theoretical SPI transfer speed limit of about 2ms per buffer write at 4Mhz SPI speed, but I guess that's something we'll have to live with to gain the luxury of being able to program a microcontroller in Python.
Unrolling the loop helps avoid double assigning each (but the last) element of the array and doing a read/modify/write. Also, the DC bit is always set, avoiding the branch since _write_many is only ever used for data transfers. Should it ever be required in the future the fallback is to just _spi_write_one for each byte.
This could of course be handled in the same manner as the unrolled loop, but I don't think it's worth optimizing seeing that it is not being used at all in the current code.
This version also loses the flexibility to _spi_send_many which are not a multiple of 8, but again, that does not seem to be happening, so not particularly concerning for now.
I left in comments for testing a version which first converts the full buffer "8to9" bits and then sends it, which brings further speed improvements (~ another FPS extra), but I felt that is not worth the overhead of another full buffer (864 bytes) of memory.

Benchmark code I used below:

start=time.time()
for o in range(8):
    lcd.fill(0)
    for i in range(0,68,4):
        lcd.line(0,0,96,i,1)
        lcd.show()
    for i in range(0,96,4):
        lcd.line(0,0,96-i,68,1)
        lcd.show()
end=time.time()
lcd.fill(0)
lcd.text('FPS: %.2f'%((68+96)*2/(end-start)),0,34)
print('FPS: %.2f'%((68+96)*2/(end-start)))
lcd.show()

Unrolling the loop helps avoid double assigning each (but the last) element of the array Also, the DC bit is always set, avoiding the branch since _write_many is only ever used for data transfers. Should it ever be required in the future the fallback is to just _spi_write_one for each byte. This could of course be handled in the same manner as the unrolled loop, but I don't think it's worth optimizing seeing that it is not being used at all in the current code. This version also loses the flexibility _spi_send_many which are not a multiple of 8, but again, that does not seem to be happening, so not particularly concerning for now.

mcauser · 2020-12-07T23:22:45Z

Thanks for the optimisation! I'll review it soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimization for _spi_write_many #1

optimization for _spi_write_many #1

Uh oh!

rolandFleddermann commented Dec 5, 2020 •

edited by mcauser

Loading

Uh oh!

mcauser commented Dec 7, 2020

Uh oh!

Uh oh!

optimization for _spi_write_many #1

Are you sure you want to change the base?

optimization for _spi_write_many #1

Uh oh!

Conversation

rolandFleddermann commented Dec 5, 2020 • edited by mcauser Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcauser commented Dec 7, 2020

Uh oh!

Uh oh!

rolandFleddermann commented Dec 5, 2020 •

edited by mcauser

Loading