Skip to content

Conversation

victoryforce
Copy link
Collaborator

The original code used a clever trick to allow format conversion (3float -> 4float) in-place (it went sequentially(!) from the end of the data buffer to the beginning thus ensuring that the data would not be overwritten). Unfortunately, this made the code OpenMP-incompatible. The use of two buffers (input data and output for darktable) made it possible to parallelize the format conversion loops.

Also, the code contained a separate loop for reordering image rows (bottom-top to top-bottom) using a temporary row buffer. In effect, this meant copying the entire array of image data twice more. For large images it took quite a significant amount of time.

We managed to get rid of these rows reordering by copying the rows to the correct address directly in the format conversion loop.

As a result, for example, a PFM file converted from a 45 megapixel Nikon Z7 camera image on a computer with an AMD Ryzen 7 5800HS CPU was loaded with the original code in 0.45 seconds, after changes in this PR in 0.09 seconds (fivefold speed up!).

...we can do this by correcting the addressing in the resulting buffer filling
loops so that the image rows end up in the right place from the beginning
@TurboGit TurboGit added this to the 5.0 milestone Oct 18, 2024
@TurboGit TurboGit added priority: low core features work as expected, only secondary/optional features don't difficulty: average some changes across different parts of the code base scope: image processing correcting pixels scope: performance doing everything the same but faster release notes: pending labels Oct 18, 2024
Copy link
Member

@TurboGit TurboGit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me, thanks!

@TurboGit TurboGit merged commit 9334965 into darktable-org:master Oct 19, 2024
6 checks passed
@victoryforce victoryforce deleted the accelerate-PFM-loading branch October 19, 2024 13:02
@victoryforce
Copy link
Collaborator Author

@TurboGit
Proposal for the release notes:

Significantly accelerated loading of PFM files.

If it is better to add technical details, then:

Significantly accelerated loading of PFM files due to loops parallelization and optimization that eliminated additional processing.

@TurboGit
Copy link
Member

A bit of details is fine by me. Thanks!

BTW, do you get notification about #16334, just wondering about the status of this.

@victoryforce
Copy link
Collaborator Author

BTW, do you get notification about #16334, just wondering about the status of this.

Yep got it, will get to commenting tonight.

Your proposal is very interesting, but it makes the code even more complicated when it is not clear how long we will still need libraw.

Actually, I wanted to understand the status of rawspeed and what is the reason that rawspeed is in such a state that it cannot completely replace libraw. But this is a big discussion topic that I didn't have the energy to start at the time.

@TurboGit
Copy link
Member

Your proposal is very interesting, but it makes the code even more complicated when it is not clear how long we will still need libraw.

I don't see libraw removed at any point in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

difficulty: average some changes across different parts of the code base priority: low core features work as expected, only secondary/optional features don't scope: image processing correcting pixels scope: performance doing everything the same but faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants