Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-29782: Use __builtin_clzl for bits_in_digit if available #594

Closed
wants to merge 1 commit into from

Conversation

niklasf
Copy link
Contributor

@niklasf niklasf commented Mar 10, 2017

No description provided.

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow these steps to rectify the issue:

  1. Sign the PSF contributor agreement. The "bugs.python.org username" requested by the form is the "Login name" field in "Your Details" at b.p.o
  2. Wait at least one US business day and then check the "Contributor form received entry under "Your Details" on bugs.python.org to see if your account has been marked as having signed the CLA (the delay is due to a person having to manually check your signed CLA)
  3. Reply here saying you have completed the above steps

Thanks again to your contribution and we look forward to looking at it!

@mention-bot
Copy link

@niklasf, thanks for your PR! By analyzing the history of the files in this pull request, we identified @mdickinson, @serhiy-storchaka, @tim-one, @brettcannon and @benjaminp to be potential reviewers.

@serhiy-storchaka serhiy-storchaka added the performance Performance or resource usage label Mar 10, 2017
Copy link
Member

@mdickinson mdickinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments left on the bpo issue. In brief:

  • configuration decisions (deciding whether __builtin_clz is available and does the right thing) should be moved into the configuration machinery.
  • the code currently assumes that digit is the same as unsigned long. That's not always the case.
  • once the above issues are addressed, we still need to weigh any speed advantage against the costs in terms of maintainability, portability, readability.

@mdickinson
Copy link
Member

the code currently assumes that digit is the same as unsigned long

Ah, no. I see. This should work whenever digit is no larger than unsigned long, thanks to the implicit type conversion. However, that's probably wasteful in the majority of cases; on many (most?) machines, we should be able to use clz rather than clzl, and it may be faster. And while it's probably true that on all current platforms digit is no larger than unsigned long, that's not an assumption we should hardwire in unnecessarily.

@niklasf niklasf force-pushed the bpo-29782-clzl branch 2 times, most recently from 21c4a4b to bff329a Compare March 10, 2017 22:32
@niklasf
Copy link
Contributor Author

niklasf commented Mar 10, 2017

  • Making an extra version for 16bit ints does not seem to be worth it, so I settled for unsigned long.
  • Factored out so that mathmodule.c can also take advantage of the function.
  • I might need help with the configuration process. Checking __GNUC__ does not seem to be uncommon, but is this the place to do it? Note that __GNUC__ is also defined by clang.
    I created a new source/header pair which might not be the way to go. Should the header contents go to pyport.h as you suggested? Where to put the C fallback?

5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5
};

int bit_length(unsigned long d) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename the function to _Py_bit_length().

Why the result is signed? Why not unsigned int?

* count_leading_zero_bits(x)
*/

#if defined(__GNUC__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the builtin available in old GCC versions? Any idea when the function was added?

Copy link
Contributor Author

@niklasf niklasf Mar 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gcc 3.4. I made the guard more precise. It also works for clang.

Btw. while looking how this is usually done in Python I found a typo that would affect gcc 3.x: https://github.com/python/cpython/blob/master/Include/pymacro.h#L56 should say __GNU*C*_MINOR__. I'll make a seperate PR. #878.

#if defined(__GNUC__)
#define HAVE_BIT_LENGTH
static inline int bit_length(unsigned long d) {
return d ? (8 * sizeof(unsigned long) - __builtin_clzl(d)) : 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to avoid the conditional code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I don't think so. __builtin_clzl(0) is undefined or rather has different behavior on various processors.

}
#elif defined(_MSC_VER)
#define HAVE_BIT_LENGTH
#include <intrin.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the following pragma?
#pragma intrinsic(_BitScanReverse)

@niklasf
Copy link
Contributor Author

niklasf commented Mar 28, 2017

@Haypo Thanks for the review. I updated the PR based on your comments.

@vstinner
Copy link
Member

http://bugs.python.org/issue29782 has been closed as REJECTED.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants