-
-
Notifications
You must be signed in to change notification settings - Fork 879
Add AVX2 versions of CombinedShannonEntropy #1848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
bab85d4
Add SSE version of CombinedShannonEntropy
brianpopow cc430cc
Avoid bounds checks
brianpopow ed8bd61
Faster SSE2 version of ShanonEntropy
brianpopow b1df6a9
Revert "Faster SSE2 version of ShanonEntropy"
brianpopow 32b97f4
Add AVX2 version of CombinedShannonEntropy
brianpopow 0fc3ce7
Add CombinedShannonEntropy tests
brianpopow 93f06bb
Merge branch 'master' into bp/shanonsse
brianpopow 265be5f
Merge branch 'master' into bp/shanonsse
brianpopow 110ff3d
Avoid using Span<int> tmp
brianpopow 5403fbd
Add better version of ReduceSum for Vector 256
brianpopow f4fe9ba
Merge branch 'master' into bp/shanonsse
brianpopow 9c95389
Merge branch 'master' into bp/shanonsse
brianpopow File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I have tried to put those repeating if statements into own method calls, but profiling has shown that this makes it actually slower even with Aggressive Inlining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't it be a loop? It looks incremental 0-7 to me.
However, my money says there's something clever that can be done with masking here to determine if each element != 0 and apply the diff as a single operation. I'm sure I've seen similar before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simd? You can check each element in a vector without if checks at all for
tmpRef
andxRef
vectors.You can also remove if checks with log2 precalculation for each case and simply multiplying that log2 vectors with comparison mask values. This may or may not be faster than if-checks - depends ong FastSLog2 implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original SSE2 version of this uses macros. I tried to keep it as similar as possible.
Because of that, I thought a loop might be also slower, but I will give it a try and test it.
FastSLog2
is in most cases (>90% just a lookup table), but still most of the time is spend here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess loop can further degrade branch predictor history. Masking is still possible but that would require some serious code rewrite for AVX branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done profile tests now with loops several times and it makes it noticeable slower.
I am not really sure that it is worth trying to optimize it further, if it would take a huge amount of work to do so. As can be seen here:

The main time consumer is
FastSLog2
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly have no idea how webp works but as far as I understand checked value is actually pretty random, adding for-loop on top of it may screw up branch predictor with yet another stable (always true for 8 iterations) if-check - that's what's most likely causing performance drop.
All of extra if-checks can be removed via simd masks but it's a question whether it'd be faster than current implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a previous SSE2 version which used masks, if you look at commit: ed8bd61, but this was not better then the current approach. Maybe, if we could create a AVX version of that, it could be better, but I am also not sure if it really can beat the current one.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually easier to describe my proposal in code than in human language. I'll probably try to implement it after this gets merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@br3aker thanks, but my advice would be, to only do it, if you think it would be easy for you and not to much work. As I said before, I am really unsure how much we can gain from this.