-
-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize rgb565 serialization #317
Optimize rgb565 serialization #317
Conversation
If CPU usage is this much of a concern, then the screen I am working on writing alternative software for may be a better option. All the drawing is done on the device end. See the main README.md. It is under the section called Fuldho. |
Thanks for optimizing this code! I agree adding numpy is not a concern as it is a well known and maintained package |
Thanks, that's interesting. However, I'd rather use the screen I already have, and I like how the protocol is straightforward (simply pasting bitmaps). |
I only have a rev A screen, so I couldn't test the other, but I see that at least rev B also uses RGB565, so the same kind of optimization could be done. Rotating/reversing should also be easily done with numpy transposing/rolling operations, I think (I'm not a numpy expert at all). I did some further testing and I noticed that my first post is totally misrepresenting the performance gains, which are much bigger than I thought! In the current code, the serialization to RGB565 and the sending of data to the serial port are interlaced. To better measure the performance gains of the serialization only, I moved the current pure-python serialization in a first step, and the line sending in a second step: pix = image.load()
line = bytes()
lines = []
for h in range(image_height):
for w in range(image_width):
R = pix[w, h][0] >> 3
G = pix[w, h][1] >> 2
B = pix[w, h][2] >> 3
rgb = (R << 11) | (G << 5) | B
line += struct.pack('<H', rgb)
# Send image data by multiple of "display width" bytes
if len(line) >= self.get_width() * 8:
lines.append(line)
line = bytes()
# Write last line if needed
if len(line) > 0:
lines.append(line)
end = time.perf_counter()
logger.debug(f"serialization done (took {end-start:.3f} s)")
# Lock queue mutex then queue all the requests for the image data
start = time.perf_counter()
with self.update_queue_mutex:
for line in lines:
self.SendLine(line)
end = time.perf_counter()
logger.debug(f"sending lines done (took {end-start:.3f} s)") On my Raspberry Pi, here are the results of the pure-python serialization for a fullscreen image:
And here's the numpy version:
So, it's more than 100x faster! It seems that numpy is crazy efficient for this kind of things. Now, the paint speed is basically limited by the speed of writing the data to serial, which I don't know why it's so slow on the raspberry pi... |
You have to keep in mind most of the screens supported by this project are using a very slow USB 1.1 interface. A whole screen blit will take 2-3 seconds. But there's also a possibility that python is still acting as a noticeable bottleneck on your slower ARM board. Python is very slow after all, especially at raw number crunching as you are seeing here. Perhaps in time there will be a better solution available. I've been thinking of starting another project to handle USB screens like the Turing ones at some point once my current |
Sure, USB 1.1 is slow, but it's still 12 Mbit/s, and the screen is recognized at that speed in both my Raspberry Pi and desktop PC. Ignoring some USB overhead, at a full 12 Mbit/s, sending the
After the serialization to RGB565, there's not really any "number crunching", it's just writing bytes to a file descriptor, which shouldn't really be impacted by Python's speed (or rather slowness). But I did think the same thing as you! So to test that I implemented a minimal Go program to paint a background, and I got the exact same results as with Python: ~1.2s on my desktop PC, ~6s on the RPi. Some further stracing show that indeed the individual write system calls themselves are slower on the RPi than on the PC.
Yep, a "driver" backend in a fast language would be nice, I thought on doing the same. |
I would be using C most likely if I end up doing it. I also wanted to implement a "separation of concerns" of sorts so the whole program doesn't have to run as |
OK so this message has nothing to do with As per the info in the Raspberry Pi forums [1] I simply added With that parameter, I have fullscreen paints in around 1.5 s, so it's just marginally slower than my desktop PC (1.2 s). Of course the Ethernet is now very slow by today's standards (~7 Mbit/s), but that's more than enough for my use case. |
Thanks @hchargois I added this info to the Troubleshooting page https://github.com/mathoudebine/turing-smart-screen-python/wiki/Troubleshooting#raspberry-pi-zero--1--2--3-display-refresh-is-too-slow |
For context: I'm trying to drive a rev A display from a (gen 1) Raspberry Pi. On a weak CPU like that, the sample program
simple-program.py
is extremely slow (~30 s to paint the background image) and the CPU usage is very high (constantly ~80 %).So, I did some profiling and I've seen that the serialization to little-endian RGB565 was the main culprit for CPU usage. Using numpy yields a huge improvement.
I've added some timing debug messages to the
simple-program
, here's before:and after:
Also, the CPU usage goes from ~80% to ~40%.
So, it's still not fast, but a 5x speedup in showing the BG, and >3x speedup in refreshing the info makes it much more useable.
On my desktop with a more powerful CPU, the timings only show a slight improvement (BG paint goes from 1.49s to 1.33s, refresh from 0.22s to 0.19s). However, there's a 2x difference in CPU usage, from ~30% to ~15%.
Sure, there's the downside of adding numpy as a dep, but I think that's not a big problem, lots of things depend on it, the Raspberry Pi OS even comes with it pre-installed.