Skip to content

optimization for _spi_write_many #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rolandFleddermann
Copy link

@rolandFleddermann rolandFleddermann commented Dec 5, 2020

In the spirit of improving the data transmission rate through combining the DC bit with multiple writes (https://forum.micropython.org/viewtopic.php?t=6824#p38843), I figured it might be worth further optimizing this part of the code. For me, the frame rate on an ESP8266 (@ 80MHz) went up from about 10 FPS to about 32 FPS or (assuming the minimal line drawing code between frames is not adding significant delays) 100 ms per full buffer write to about 30 ms per buffer write.
Somewhat frustratingly this is still nowhere near the theoretical SPI transfer speed limit of about 2ms per buffer write at 4Mhz SPI speed, but I guess that's something we'll have to live with to gain the luxury of being able to program a microcontroller in Python.
Unrolling the loop helps avoid double assigning each (but the last) element of the array and doing a read/modify/write. Also, the DC bit is always set, avoiding the branch since _write_many is only ever used for data transfers. Should it ever be required in the future the fallback is to just _spi_write_one for each byte.
This could of course be handled in the same manner as the unrolled loop, but I don't think it's worth optimizing seeing that it is not being used at all in the current code.
This version also loses the flexibility to _spi_send_many which are not a multiple of 8, but again, that does not seem to be happening, so not particularly concerning for now.
I left in comments for testing a version which first converts the full buffer "8to9" bits and then sends it, which brings further speed improvements (~ another FPS extra), but I felt that is not worth the overhead of another full buffer (864 bytes) of memory.

Benchmark code I used below:

start=time.time()
for o in range(8):
    lcd.fill(0)
    for i in range(0,68,4):
        lcd.line(0,0,96,i,1)
        lcd.show()
    for i in range(0,96,4):
        lcd.line(0,0,96-i,68,1)
        lcd.show()
end=time.time()
lcd.fill(0)
lcd.text('FPS: %.2f'%((68+96)*2/(end-start)),0,34)
print('FPS: %.2f'%((68+96)*2/(end-start)))
lcd.show()

Unrolling the loop helps avoid double assigning each (but the last) element of the array
Also, the DC bit is always set, avoiding the branch since _write_many is only ever
used for data transfers. Should it ever be required in the future the fallback
is to just _spi_write_one for each byte.
This could of course be handled in the same manner as the unrolled loop, but
I don't think it's worth optimizing seeing that it is not being used at all
in the current code.
This version also loses the flexibility _spi_send_many which are not a multiple of 8,
but again, that does not seem to be happening, so not particularly concerning for now.
@mcauser
Copy link
Owner

mcauser commented Dec 7, 2020

Thanks for the optimisation! I'll review it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants