-
-
Notifications
You must be signed in to change notification settings - Fork 879
Add AVX2 versions of CombinedShannonEntropy #1848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Profiling does not proof that this version is actually faster.
} | ||
} | ||
|
||
if (Unsafe.Add(ref tmpRef, 1) != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I have tried to put those repeating if statements into own method calls, but profiling has shown that this makes it actually slower even with Aggressive Inlining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't it be a loop? It looks incremental 0-7 to me.
However, my money says there's something clever that can be done with masking here to determine if each element != 0 and apply the diff as a single operation. I'm sure I've seen similar before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simd? You can check each element in a vector without if checks at all for tmpRef
and xRef
vectors.
You can also remove if checks with log2 precalculation for each case and simply multiplying that log2 vectors with comparison mask values. This may or may not be faster than if-checks - depends ong FastSLog2 implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original SSE2 version of this uses macros. I tried to keep it as similar as possible.
Because of that, I thought a loop might be also slower, but I will give it a try and test it.
FastSLog2
is in most cases (>90% just a lookup table), but still most of the time is spend here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess loop can further degrade branch predictor history. Masking is still possible but that would require some serious code rewrite for AVX branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly have no idea how webp works but as far as I understand checked value is actually pretty random, adding for-loop on top of it may screw up branch predictor with yet another stable (always true for 8 iterations) if-check - that's what's most likely causing performance drop.
All of extra if-checks can be removed via simd masks but it's a question whether it'd be faster than current implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a previous SSE2 version which used masks, if you look at commit: ed8bd61, but this was not better then the current approach. Maybe, if we could create a AVX version of that, it could be better, but I am also not sure if it really can beat the current one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually easier to describe my proposal in code than in human language. I'll probably try to implement it after this gets merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@br3aker thanks, but my advice would be, to only do it, if you think it would be easy for you and not to much work. As I said before, I am really unsure how much we can gain from this.
Codecov Report
@@ Coverage Diff @@
## master #1848 +/- ##
======================================
Coverage 87% 87%
======================================
Files 937 937
Lines 48654 48727 +73
Branches 6067 6094 +27
======================================
+ Hits 42434 42507 +73
Misses 5209 5209
Partials 1011 1011
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Co-authored-by: Günther Foidl <gue@korporal.at>
Prerequisites
Description
This PR adds a AVX2 version of
CombinedShannonEntropy
, which is used during lossless webp encoding.Related to #1786
Profiling results:
master
PR
Test image was
Callipohora.png