Skip to content

Add AVX2 versions of CombinedShannonEntropy #1848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Nov 27, 2021
Merged

Add AVX2 versions of CombinedShannonEntropy #1848

merged 12 commits into from
Nov 27, 2021

Conversation

brianpopow
Copy link
Collaborator

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

This PR adds a AVX2 version of CombinedShannonEntropy, which is used during lossless webp encoding.

Related to #1786

Profiling results:

master

shannon_master

PR

shannon_avx

Test image was Callipohora.png

}
}

if (Unsafe.Add(ref tmpRef, 1) != 0)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I have tried to put those repeating if statements into own method calls, but profiling has shown that this makes it actually slower even with Aggressive Inlining.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't it be a loop? It looks incremental 0-7 to me.

However, my money says there's something clever that can be done with masking here to determine if each element != 0 and apply the diff as a single operation. I'm sure I've seen similar before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simd? You can check each element in a vector without if checks at all for tmpRef and xRef vectors.

You can also remove if checks with log2 precalculation for each case and simply multiplying that log2 vectors with comparison mask values. This may or may not be faster than if-checks - depends ong FastSLog2 implementation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original SSE2 version of this uses macros. I tried to keep it as similar as possible.
Because of that, I thought a loop might be also slower, but I will give it a try and test it.

FastSLog2 is in most cases (>90% just a lookup table), but still most of the time is spend here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess loop can further degrade branch predictor history. Masking is still possible but that would require some serious code rewrite for AVX branch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done profile tests now with loops several times and it makes it noticeable slower.

I am not really sure that it is worth trying to optimize it further, if it would take a huge amount of work to do so. As can be seen here:
shannon_avx2

The main time consumer is FastSLog2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly have no idea how webp works but as far as I understand checked value is actually pretty random, adding for-loop on top of it may screw up branch predictor with yet another stable (always true for 8 iterations) if-check - that's what's most likely causing performance drop.

All of extra if-checks can be removed via simd masks but it's a question whether it'd be faster than current implementation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a previous SSE2 version which used masks, if you look at commit: ed8bd61, but this was not better then the current approach. Maybe, if we could create a AVX version of that, it could be better, but I am also not sure if it really can beat the current one.

Copy link
Contributor

@br3aker br3aker Nov 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually easier to describe my proposal in code than in human language. I'll probably try to implement it after this gets merged.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@br3aker thanks, but my advice would be, to only do it, if you think it would be easy for you and not to much work. As I said before, I am really unsure how much we can gain from this.

@codecov
Copy link

codecov bot commented Nov 23, 2021

Codecov Report

Merging #1848 (9c95389) into master (b67a8db) will increase coverage by 0%.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #1848   +/-   ##
======================================
  Coverage      87%     87%           
======================================
  Files         937     937           
  Lines       48654   48727   +73     
  Branches     6067    6094   +27     
======================================
+ Hits        42434   42507   +73     
  Misses       5209    5209           
  Partials     1011    1011           
Flag Coverage Δ
unittests 87% <100%> (+<1%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/ImageSharp/Common/Helpers/Numerics.cs 97% <100%> (+<1%) ⬆️
.../ImageSharp/Formats/Webp/Lossless/LosslessUtils.cs 90% <100%> (+1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b67a8db...9c95389. Read the comment docs.

@brianpopow brianpopow merged commit 013c29e into master Nov 27, 2021
@brianpopow brianpopow deleted the bp/shanonsse branch November 27, 2021 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants