-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved halide_popcount #7225
Improved halide_popcount #7225
Conversation
@@ -155,8 +155,9 @@ template<typename T> | |||
inline int halide_popcount(T a) { | |||
int bits_set = 0; | |||
while (a != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, ~all modern compilers should have popcount as an intrinsic -- see popcount64
in Util.cpp, maybe we should just adapt that here
9debc3f
to
158bf84
Compare
158bf84
to
9391f17
Compare
What's the story on this, ready to land? |
@@ -152,15 +152,39 @@ inline float float_from_bits(uint32_t bits) { | |||
} | |||
|
|||
template<typename T> | |||
inline int halide_popcount(T a) { | |||
inline int halide_popcount_fallback(T a) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave a link to the source for this algorithm for the reference (found this one https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan, for example).
src/CodeGen_C.cpp
Outdated
bits_set += a & 1; | ||
a >>= 1; | ||
bits_set += 1; | ||
// NOTE(aelphy): remove least significant zeros and the first met one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just make it a regular comment (i.e. remove NOTE(aelphy) part).
Where does this PR stand? |
* Improved halide_popcount * reused popcount64 from Utils.cpp in CodeGen_C * Fixed comment for popcount
This is a slightly more efficient version of a popcount, which has as many iterations as many ones are in the binary representation.