optimization for _spi_write_many #1
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In the spirit of improving the data transmission rate through combining the DC bit with multiple writes (https://forum.micropython.org/viewtopic.php?t=6824#p38843), I figured it might be worth further optimizing this part of the code. For me, the frame rate on an ESP8266 (@ 80MHz) went up from about 10 FPS to about 32 FPS or (assuming the minimal line drawing code between frames is not adding significant delays) 100 ms per full buffer write to about 30 ms per buffer write.
Somewhat frustratingly this is still nowhere near the theoretical SPI transfer speed limit of about 2ms per buffer write at 4Mhz SPI speed, but I guess that's something we'll have to live with to gain the luxury of being able to program a microcontroller in Python.
Unrolling the loop helps avoid double assigning each (but the last) element of the array and doing a read/modify/write. Also, the DC bit is always set, avoiding the branch since _write_many is only ever used for data transfers. Should it ever be required in the future the fallback is to just _spi_write_one for each byte.
This could of course be handled in the same manner as the unrolled loop, but I don't think it's worth optimizing seeing that it is not being used at all in the current code.
This version also loses the flexibility to _spi_send_many which are not a multiple of 8, but again, that does not seem to be happening, so not particularly concerning for now.
I left in comments for testing a version which first converts the full buffer "8to9" bits and then sends it, which brings further speed improvements (~ another FPS extra), but I felt that is not worth the overhead of another full buffer (864 bytes) of memory.
Benchmark code I used below: