DWA: initialize linear/nonlinear tables at runtime#2174
DWA: initialize linear/nonlinear tables at runtime#2174cary-ilm merged 3 commits intoAcademySoftwareFoundation:mainfrom
Conversation
Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>
DWA compression has two lookup tables for nonlinear value encoding, each 128KB size. Initialize these tables once upon first use of DWA compression, instead of spending 256KB if binary size on them. The initialization itself takes 0.47ms (Mac M4 Max). OpenEXRCore-4_0.dylib size goes 628KB -> 370KB. The previously used precalculated table (dwaLookups.h) still stays in the repository, but it got moved into the tests only folder, and the tests were changed to ensure that runtime-initialized tables match the previous hardcoded table exactly. Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>
|
The python wheel build error seems to be some internet connection related, it fails with: |
|
@aras-p did you compare the performance of table lookups to computing values on the fly, as you did for B44? If the tables can be computed that fast it does make me wonder if there's any benefit using them at all. |
|
@peterhillman I have not, computing the values needed for DWA would be at least as expensive as for B44 case (B44: just exp or log depending on table; DWA: pow - tends to be more complex math). And DWA is really widely used (way more than B44, I guess), so I assumed that any runtime performance regression would be not acceptable. |
|
Fair enough. I did wonder whether computing on the fly would be faster in some cases because it would free up the processor cache for more useful data, but as you say that would have been more likely with B44 |
cary-ilm
left a comment
There was a problem hiding this comment.
We discussed in the steering committee meeting, looks good, thanks!
* Move thread-safe single initialization utilities into internal_thread.h Signed-off-by: Aras Pranckevicius <aras@nesnausk.org> * DWA: initialize linear/nonlinear tables at runtime DWA compression has two lookup tables for nonlinear value encoding, each 128KB size. Initialize these tables once upon first use of DWA compression, instead of spending 256KB if binary size on them. The initialization itself takes 0.47ms (Mac M4 Max). OpenEXRCore-4_0.dylib size goes 628KB -> 370KB. The previously used precalculated table (dwaLookups.h) still stays in the repository, but it got moved into the tests only folder, and the tests were changed to ensure that runtime-initialized tables match the previous hardcoded table exactly. Signed-off-by: Aras Pranckevicius <aras@nesnausk.org> --------- Signed-off-by: Aras Pranckevicius <aras@nesnausk.org> Co-authored-by: Cary Phillips <cary@ilm.com>
All other Core code uses internal_coding.h functions half_to_float & float_to_half, in order to make it compile when IMath is not present at all, follow the same pattern. This dependency was accidentally introduced in AcademySoftwareFoundation#2174 and AcademySoftwareFoundation#2126 Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>
DWA compression has two lookup tables for nonlinear value encoding, each 128KB size. Initialize these tables once upon
first use of DWA compression, instead of spending 256KB if binary size on them.
The initialization itself takes 0.47ms (Mac M4 Max).
OpenEXRCore-4_0.dylib size goes 628KB -> 370KB.
The previously used precalculated table (dwaLookups.h) still stays in the repository, but it got moved into the tests only folder, and the tests were changed to ensure that runtime-initialized tables match the previous hardcoded table exactly.