Skip to content

Conversation

@RunDevelopment
Copy link
Collaborator

@RunDevelopment RunDevelopment commented Oct 19, 2025

Closes #20

This PR adds a BC7 encoder to the library. I'm still working on it. I already implemented all modes, but it still needs more testing, refactoring, and benchmarking. I haven't looked at performance at all yet, and also I haven't tuned any of the parameters for refinement yet. As I said, wip. Edit: It's ready.

This PR adds a new BC7 encoder to the library. The encoder is CPU based and supports multiple quality levels. The implementation is optimized to strike a balance between encoding speed and output quality. However, the BC7 encoder is still around 10x slower than the BC1 encoder. This is unfortunate, but not too bad. Encoding 4k RGB texture with BC7 Normal quality takes around 4 seconds on my machine (using multithreading).

Performance

Quality BC1 RGB BC7 RGB BC7 RGBA
Fast 0.78 ms 10.18 ms 5.74 ms
Normal 2.20 ms 22.64 ms 22.33 ms
High 11.58 ms 76.79 ms 34.99 ms

(Timings for encoding a 128x128 f32 RGB/RGBA image with uniformly distributed random colors on a single thread of an Intel i7-8700K CPU.)

Obviously, BC7 encoding is significantly slower than BC1. It would possible to further lower the quality to speed up encoding, but the quality drop would be quite significant. E.g. using only mode 6 for Fast (like DirectXTex does) would yield around 2~1 ms for BC7 RGB, but lose around 5 dB. Any trade off between speed and quality has to be carefully considered. I already implemented lots of parameters to tweak the encoder, so future PRs could easily add more quality levels or tweak existing ones.

I also want to points out that RGBA encoding is generally faster than RGB encoding. This is simply because the modes 0-3 cannot be used for blocks with alpha, which means that there are fewer modes to try. This is especially noticeable for the High and Fast quality, where partition scoring for mode 0 and 2 can be skipped entirely.

On that note, the BC7 encoder does not blindly try out every partition for every mode. Instead, it first scores all partitions using a line fit, and then tries the top K partitions. Since partitions are the same between multiple modes, partitions are only scored once per block per subset count. So the main reason why RGBA encoding is faster is that scoring for 3-subset partitions (mode 0 and 2) can be skipped. Scoring is unfortunately still quite expensive (though around 10x cheaper than blindly trying partitions). It accounts for around 4ms on Fast, 8~9ms on Normal, and 25~30ms on High in the benchmark.

Just like for partitions, the encoder also uses heuristics to select rotations for modes 4 and 5 instead of blindly trying all of them. Same for p bits.

Quality

The quality of the BC7 encoder is quite good, especially for the Normal and High quality levels. Fast isn't bad either, but has to live with some compromises to achieve decent performance. Nonetheless, Fast is still significantly better than BC1 quality-wise.

Image BC1 High BC7 Fast BC7 Normal BC7 High BC7 Unreasonable
base.png - 34.71 dB 36.76 dB 39.47 dB 39.96 dB
color-twirl.png 45.59 dB 51.99 dB 52.81 dB 54.11 dB 54.61 dB
bricks-d.png 31.34 dB 39.25 dB 39.79 dB 40.69 dB 40.93 dB
bricks-n.png 27.85 dB 33.23 dB 33.15 dB 33.78 dB 33.82 dB
clovers-d.png 34.25 dB 40.04 dB 41.49 dB 42.15 dB 42.39 dB
clovers-r.png 32.82 dB 44.32 dB 44.37 dB 45.21 dB 46.55 dB
stone-d.png 34.86 dB 44.04 dB 44.59 dB 45.16 dB 46.01 dB
grass.png - 35.30 dB 36.17 dB 36.30 dB 36.32 dB
leaves.png - 33.53 dB 35.42 dB 35.59 dB 35.64 dB

(This compares the PSNR of the C metric (=combined error for R, G, and B) with the uniform error metric. I excluded BC1 scores for images with alpha channels, because BC1 only has binary alpha, which unfairly favors BC7.)

As we can see, even BC7 Fast is vastly superior to BC1 High, and the quality of all BC7 levels is somewhat comparable.

BC7 Unreasonable uses maximum encoder settings and represents the highest possible quality the encoder can achieve with the current implementation. It is mostly useful as a reference point to see how much quality is lost at lower quality levels. The Unreasonable quality is around 20x to 30x slower than High, so it is not practical.

@RunDevelopment RunDevelopment marked this pull request as ready for review October 25, 2025 16:08
@RunDevelopment RunDevelopment mentioned this pull request Oct 26, 2025
@RunDevelopment
Copy link
Collaborator Author

Okay, I think it's good enough now. I still have some ideas to make the quality even better, but those will come at the cost of more processing time (e.g. using size variations like in #90) and will have to be carefully considered.

Quality-wise, this encoder isn't too bad. It can hold its own against the likes of NVTT Exporter and AMD Compressonator. Just like in #90, I threw together a little dataset of 57 1024x1024 RGB color textures to measure the quality of these encoders. Here are the results, relative to NVTT Highest:

image

AMD Compressonator has a -Quality setting that takes a number between 0 and 1. The reason I stopped at 0.4 is time. Encoding a 1k image with -Quality=0.5 takes >60 seconds on my machine and highest quality takes even longer. I'm not going to wait that long. For reference, this library only takes 1 second per 1k image on High quality. (NVTT Highest is super slow too, taking 60 seconds per image.)

So this BC7 encoder can achieve quality only 0.3 dB worse than industry-standard encoders, while being an order of magnitude faster.

@RunDevelopment RunDevelopment merged commit c717299 into image-rs:main Nov 7, 2025
10 checks passed
@RunDevelopment RunDevelopment deleted the bc7 branch November 7, 2025 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BC7 encoder

1 participant