Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-29882: Add _Py_popcount32() function #20518

Merged
merged 6 commits into from
Jun 8, 2020
Merged

bpo-29882: Add _Py_popcount32() function #20518

merged 6 commits into from
Jun 8, 2020

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented May 29, 2020

  • Rename pycore_byteswap.h to pycore_bitutils.h.
  • Move popcount_digit() to pycore_bitutils.h as _Py_popcount32().
  • _Py_popcount32() uses GCC and clang builtin function if available.
  • Add unit tests to _Py_popcount32().

https://bugs.python.org/issue29882

@vstinner
Copy link
Member Author

vstinner commented May 29, 2020

@mdickinson: here you have.

cc @pablogsal @1st1

@vstinner
Copy link
Member Author

On my x86-64 Fedora 32 with GCC 10.1.1, __builtin_popcount() becomes a call to __popcountdi2() function:

(gdb) disassemble __popcountdi2 
Dump of assembler code for function __popcountdi2:
   0x00007fffeaa997c0 <+0>:	endbr64 
   0x00007fffeaa997c4 <+4>:	movabs rdx,0x5555555555555555
   0x00007fffeaa997ce <+14>:	mov    rax,rdi
   0x00007fffeaa997d1 <+17>:	shr    rax,1
   0x00007fffeaa997d4 <+20>:	and    rax,rdx
   0x00007fffeaa997d7 <+23>:	movabs rdx,0x3333333333333333
   0x00007fffeaa997e1 <+33>:	sub    rdi,rax
   0x00007fffeaa997e4 <+36>:	mov    rax,rdi
   0x00007fffeaa997e7 <+39>:	shr    rdi,0x2
   0x00007fffeaa997eb <+43>:	and    rax,rdx
   0x00007fffeaa997ee <+46>:	and    rdi,rdx
   0x00007fffeaa997f1 <+49>:	add    rdi,rax
   0x00007fffeaa997f4 <+52>:	mov    rax,rdi
   0x00007fffeaa997f7 <+55>:	shr    rax,0x4
   0x00007fffeaa997fb <+59>:	add    rax,rdi
   0x00007fffeaa997fe <+62>:	movabs rdi,0xf0f0f0f0f0f0f0f
   0x00007fffeaa99808 <+72>:	and    rax,rdi
   0x00007fffeaa9980b <+75>:	movabs rdi,0x101010101010101
   0x00007fffeaa99815 <+85>:	imul   rax,rdi
   0x00007fffeaa99819 <+89>:	shr    rax,0x38
   0x00007fffeaa9981d <+93>:	ret    
End of assembler dump.

It looks like the SWAR popcount.

Using -march=native, I get the expected native x86 POPCNT instruction:

popcnt ecx,ecx

@vstinner
Copy link
Member Author

I wrote this PR to factorize duplicate function:

  • hamt_bitcount() in hamt.c
  • popcount_digit() in Objects/longobject.c

I also added __builtin_popcount() to address the first paragraph of hamt.c comment:

    /* We could use native popcount instruction but that would
       require to either add configure flags to enable SSE4.2
       support or to detect it dynamically.  Otherwise, we have
       a risk of CPython not working properly on older hardware.

       In practice, there's no observable difference in
       performance between using a popcount instruction or the
       following fallback code.

       The algorithm is copied from:
       https://graphics.stanford.edu/~seander/bithacks.html
    */

@vstinner
Copy link
Member Author

This PR is a follow-up of PR #771 which added int.bit_count() Python function and popcount_digit() C function.

@vstinner
Copy link
Member Author

vstinner commented Jun 2, 2020

@mdickinson: I updated the PR. Would you mind to review it?

@mdickinson mdickinson self-requested a review June 5, 2020 07:55
@mdickinson
Copy link
Member

Would you mind to review it?

Will do, but not before the weekend.

Copy link
Member

@mdickinson mdickinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments; mostly nitpicks, but there are some bits I don't understand (like the SIZEOF_INT == 8 branch).

@vstinner
Copy link
Member Author

vstinner commented Jun 7, 2020

Oh, 45af786 is good example of why I'm mentioning one used function on an include :-)

vstinner added 3 commits June 7, 2020 23:19
* Rename pycore_byteswap.h to pycore_bitutils.h.
* Move popcount_digit() to pycore_bitutils.h as _Py_popcount32().
* _Py_popcount32() uses GCC and clang builtin function if available.
* Add unit tests to _Py_popcount32().
vstinner added 3 commits June 7, 2020 23:19
* Remove UINT32_C()
* Fix #ifdef to decide if builtin functions are used of not:
  use SIZEOF_INT >= 4. Always use __builtin_popcountl() otherwise.
* popcount_digit(): check PyLong_SHIFT rather than sizeof(digit).
* Fix also two comments
@vstinner
Copy link
Member Author

vstinner commented Jun 7, 2020

@mdickinson: PR updated to address your latest review.

PR rebased to fix a merge conflict with 45af786

Copy link
Member

@mdickinson mdickinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vstinner vstinner merged commit c6b292c into python:master Jun 8, 2020
@vstinner vstinner deleted the bitutils branch June 8, 2020 14:30
@vstinner
Copy link
Member Author

vstinner commented Jun 8, 2020

Thanks @mdickinson for the reviews!

niklasf added a commit to niklasf/cpython that referenced this pull request Jun 8, 2020
In pythonGH-2866, _Py_Bit_Length() was added to pymath.h for lack of a better
location. pythonGH-20518 added a more appropriate header file for bit utilities. It
also shows how to properly use intrinsics. This allows reconsidering bpo-29782.

* Move the function to the new header.
* Changed return type to match __builtin_clzl and reviewed usage.
* Use intrinsics where available.
* Pick a (mostly theoretical) fallback implementation suitable for
  inlining.
vstinner pushed a commit that referenced this pull request Jun 15, 2020
In GH-2866, _Py_Bit_Length() was added to pymath.h for lack of a better
location. GH-20518 added a more appropriate header file for bit utilities. It
also shows how to properly use intrinsics. This allows reconsidering bpo-29782.

* Move the function to the new header.
* Changed return type to match __builtin_clzl() and reviewed usage.
* Use intrinsics where available.
* Pick a fallback implementation suitable for inlining.
arun-mani-j pushed a commit to arun-mani-j/cpython that referenced this pull request Jul 21, 2020
* Rename pycore_byteswap.h to pycore_bitutils.h.
* Move popcount_digit() to pycore_bitutils.h as _Py_popcount32().
* _Py_popcount32() uses GCC and clang builtin function if available.
* Add unit tests to _Py_popcount32().
arun-mani-j pushed a commit to arun-mani-j/cpython that referenced this pull request Jul 21, 2020
In pythonGH-2866, _Py_Bit_Length() was added to pymath.h for lack of a better
location. pythonGH-20518 added a more appropriate header file for bit utilities. It
also shows how to properly use intrinsics. This allows reconsidering bpo-29782.

* Move the function to the new header.
* Changed return type to match __builtin_clzl() and reviewed usage.
* Use intrinsics where available.
* Pick a fallback implementation suitable for inlining.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants