bpo-37812: Expand confusing CHECK_SMALL_INT so return is explicit. #15216

gnprice · 2019-08-11T23:30:30Z

It's not obvious when looking at a call site of this macro that it
affects the control flow. Move the return into the callers so
it's explicit there, and factor out the fiddly bit as IS_SMALL_INT.

As a bonus, a couple of other spots can also use IS_SMALL_INT with a
benefit to clarity.

https://bugs.python.org/issue37812

It's not obvious when looking at a call site of this macro that it affects the control flow. Move the `return` into the callers so it's explicit there, and factor out the fiddly bit as IS_SMALL_INT. As a bonus, a couple of other spots can also use IS_SMALL_INT with a benefit to clarity.

benjaminp · 2019-08-13T05:56:37Z

Objects/longobject.c

@@ -41,6 +41,9 @@ PyObject *_PyLong_One = NULL;
   -NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
 */
 static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];
+
+#define IS_SMALL_INT(ival) (-NSMALLNEGINTS <= (ival) && (ival) < NSMALLPOSINTS)


Why not make this a function?

I actually have a patch for that too 🙂 : gnprice@cfc91e6ca

I left that out of this commit in order to change the fewest possible aspects at once, and figured I'd send it as a separate followup provided this one is merged. But I'd also be happy to add it to this PR, if there's agreement that it's desirable; I think it'd fit well as part of the same change.

(Note the "before" side of that commit looks slightly different from the "after" of this one, as there are a couple of other commits in between on that development branch. But it'd be no trouble for me to rebase it to directly on top of this one in order to add to this PR.)

Okay, if making a function is incoming, this PR lgtm.

I'll let Raymond weight in.

Go ahead and make it a static inlined function.

This is a somewhat minor change, but let's add a blurb NEWS entry in case anyone cases about these details.

Sounds good! Thanks for the review.

rhettinger · 2019-08-22T05:06:24Z

Objects/longobject.c

@@ -41,6 +41,9 @@ PyObject *_PyLong_One = NULL;
   -NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
 */
 static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];
+
+#define IS_SMALL_INT(ival) (-NSMALLNEGINTS <= (ival) && (ival) < NSMALLPOSINTS)


Go ahead and make it a static inlined function.

bedevere-bot · 2019-08-22T05:06:34Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

gnprice · 2019-08-24T05:47:55Z

Thanks for the reviews!

I have made the requested changes; please review again.

bedevere-bot · 2019-08-24T05:47:58Z

Thanks for making the requested changes!

@rhettinger: please review the changes made to this pull request.

ghost · 2019-08-27T05:37:18Z

Using inlined function generates unnecessary machine code than using macro.
I built a 32-bit Release version with VC2017, below is the starting code of PyLong_FromLong function.

The ival was first converted to a long long type, this is unnecessary .

use inlined function:

_PyLong_FromLong PROC					; COMDAT

; 317  : {

	push	ebp
	mov	ebp, esp
	and	esp, -8					; fffffff8H
	push	ecx
	mov	ecx, DWORD PTR _ival$[ebp]

; 318  :     PyLongObject *v;
; 319  :     unsigned long abs_ival;
; 320  :     unsigned long t;  /* unsigned so >> doesn't propagate sign bit */
; 321  :     int ndigits = 0;
; 322  :     int sign;
; 323  : 
; 324  :     if (is_small_int(ival)) {

	mov	eax, ecx
	push	ebx
	push	esi
	cdq
	push	edi
	xor	edi, edi

; 48   :     return -NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS;

	add	eax, 5
	adc	edx, edi
	test	edx, edx
	ja	SHORT $LN6@PyLong_Fro
	jb	SHORT $LN32@PyLong_Fro
	cmp	eax, 261				; 00000105H
	ja	SHORT $LN6@PyLong_Fro
$LN32@PyLong_Fro:

; 60   :     v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];

	movsx	eax, cx
	shl	eax, 4
	add	eax, OFFSET _small_ints+80
; File d:\dev\cpython\include\object.h

; 459  :     op->ob_refcnt++;

	inc	DWORD PTR [eax]
; File d:\dev\cpython\objects\longobject.c

; 383  : }

	pop	edi
	pop	esi
	pop	ebx
	mov	esp, ebp
	pop	ebp
	ret	0

use macro:

_PyLong_FromLong PROC					; COMDAT

; 318  : {

	push	ebp
	mov	ebp, esp
	mov	eax, DWORD PTR _ival$[ebp]
	push	edi

; 319  :     PyLongObject *v;
; 320  :     unsigned long abs_ival;
; 321  :     unsigned long t;  /* unsigned so >> doesn't propagate sign bit */
; 322  :     int ndigits = 0;

	xor	edi, edi

; 323  :     int sign;
; 324  : 
; 325  :     if (IS_SMALL_INT(ival)) {

	cmp	eax, -5					; fffffffbH
	jl	SHORT $LN6@PyLong_Fro
	cmp	eax, 257				; 00000101H
	jge	SHORT $LN6@PyLong_Fro

; 60   :     v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];

	cwde
	shl	eax, 4
	add	eax, OFFSET _small_ints+80
	pop	edi
; File d:\dev\cpython\include\object.h

; 459  :     op->ob_refcnt++;

	inc	DWORD PTR [eax]
; File d:\dev\cpython\objects\longobject.c

; 384  : }

	pop	ebp
	ret	0

gnprice · 2019-08-27T07:07:24Z

I built a 32-bit Release version with VC2017, below is the starting code of PyLong_FromLong function.

Interesting, thanks!

It looks like several compilers produce code much like this when compiling for x86-32. Here's a Compiler Explorer example I just created:
https://godbolt.org/z/8qPNez
The pane on the left has source code that models the relevant bits of ours. The lower right has two different compilers compiling for x86-32.

On the other hand, as you can see in the same Compiler Explorer example (in the upper-right pane), VC2017 compiling for x86-64 produces essentially identical code for the macro version and the static-inline-function version. So does Clang; and in fact, because in my example both are present, GCC notices they're identical and makes the macro version just jump immediately to the function version 😄

I believe the longer code represents a missed optimization by the compilers; I don't think there's any semantic reason why (on any architecture) they have to emit anything different for the function version vs. the macro version.

In any case, stepping back: we're talking about a handful of additional instructions, of the kinds a CPU is very fast at (no random reads from memory, which can take thousands of cycles and which the interpreter by its nature is full of); also only in 32-bit builds, not 64-bit. It's interesting to see the code that's generated and study it... but convincing the compiler to generate one of these versions over another is very much a micro-optimization, the kind that generally isn't worth more than a very small delta in the simplicity of the code.

For more fun, I just went and added to the Compiler Explorer example a version with IS_SMALL_INT as a macro:
https://godbolt.org/z/8GVhVR

With IS_SMALL_INT, we get the micro-optimization: all of MSVC, GCC, and Clang optimize the code just as well as they do the CHECK_SMALL_INT version, for x86-32 as well as for x86-64. The IS_SMALL_INT version isn't much more complex than is_small_int; the main gnarliness was in CHECK_SMALL_INT.

So, switching from is_small_int to IS_SMALL_INT (as in the original version of this PR) would arguably be an OK trade. Though I still wouldn't be enthusiastic about doing so; in general I'd rather use regular C-level functions, with all their goodies like proper type-checking and visibility to debuggers and other tools, everywhere possible, and use a preprocessor macro only in cases where truly only a macro can do the job. (In this file, the two WRITE_DIGITS macros are good examples of that; there are many others in the codebase.)

ghost · 2019-08-27T12:38:41Z

If build a 32-bit version with gcc 9.2, it also generates unnecessary machine code (convert to long long) as above:
https://godbolt.org/z/qna2eG

Same with ARM gcc 8.2: https://godbolt.org/z/qycDzR

long long appears in the argument of inlined function, maybe we can't ignore it.

ghost · 2019-08-27T15:19:56Z

VC2017 compiling for x86-64 produces essentially identical code for the macro version and the static-inline-function version

In this case, the long -> long long converting is still there, but doesn't use too much machine code:
https://godbolt.org/z/-78POZ

GCC notices they're identical and makes the macro version just jump immediately to the function version

I know why, on 64-bit Windows, long is 4-byte. But on 64-bit GCC, long is 8-byte, so long and long long are the same thing.
https://stackoverflow.com/questions/384502/what-is-the-bit-size-of-long-on-64-bit-windows

gnprice · 2019-08-27T20:15:03Z

long long appears in the argument of inlined function, maybe we can't ignore it.

No, compilers have a lot more freedom than that. The code just needs to compute the right answer. The types in the source are part of defining what the right answer is, but there doesn't need to be anything at runtime that physically looks like them (in a case like this where they aren't part of any external interface.)

Here's a quick demo of that:
https://godbolt.org/z/1Bj9KG
It's a tiny simplified model of the is_small_int logic, compiled for x86-32 by both GCC and Clang. In this version, GCC still emits the extra few instructions to extend a 32-bit int to a 64-bit long long... but Clang sees that the behavior doesn't actually depend on the extra bits, and it optimizes down to very tight code that skips that and works only on 32-bit values. In fact the code it produces is exactly the same if you edit the source to replace long long with int.

ghost · 2019-08-28T02:01:11Z

No, compilers have a lot more freedom than that. The code just needs to compute the right answer. The types in the source are part of defining what the right answer is, but there doesn't need to be anything at runtime that physically looks like them (in a case like this where they aren't part of any external interface.)

If one day int128_t type and PyLong_FromInt128 function are used in the code, converting to long long generates wrong result.
So using a certain type long long is not a good idea.

I have no opinion on the use of inlined function or macro here.
There are no wrong results now, just a little performance loss.

gnprice · 2019-08-28T05:20:32Z

If one day int128_t type and PyLong_FromInt128 function are used in the code, converting to long long generates wrong result.
So using a certain type long long is not a good idea.

That's true (provided long long in the same environment was smaller than an int128_t). For that reason, it'd be better to write intmax_t instead of long long. I'd forgotten about intmax_t when I wrote this PR, but a related PR of Sergey's reminded me of it: #15192 (comment)

ghost · 2019-09-03T04:06:48Z

It seems that all the major compilers have generated unnecessary code.
It's an unnecessary waste of performance, although very tiny. I created issue38015 for this.

…is explicit. (pythonGH-15216)

the-knights-who-say-ni added the CLA signed label Aug 11, 2019

bedevere-bot added the awaiting review label Aug 11, 2019

rhettinger self-assigned this Aug 12, 2019

benjaminp reviewed Aug 13, 2019

View reviewed changes

rhettinger requested changes Aug 22, 2019

View reviewed changes

bedevere-bot added awaiting changes and removed awaiting review labels Aug 22, 2019

gnprice added 2 commits August 23, 2019 22:40

macro IS_SMALL_INT -> static-inline is_small_int

27f7555

Add NEWS entry.

9b210ed

bedevere-bot added awaiting change review and removed awaiting changes labels Aug 24, 2019

rhettinger merged commit 5e63ab0 into python:master Aug 24, 2019

bedevere-bot removed the awaiting change review label Aug 24, 2019

gnprice deleted the pr-smallint-explicit-return branch August 24, 2019 22:37

gnprice mentioned this pull request Sep 10, 2019

bpo-37802: Slightly improve perfomance of PyLong_FromSize_t() #15192

Merged

lisroach pushed a commit to lisroach/cpython that referenced this pull request Sep 10, 2019

bpo-37812: Convert CHECK_SMALL_INT macro to a function so the return …

1f0ac4a

…is explicit. (pythonGH-15216)

DinoV pushed a commit to DinoV/cpython that referenced this pull request Jan 14, 2020

bpo-37812: Convert CHECK_SMALL_INT macro to a function so the return …

453b87d

…is explicit. (pythonGH-15216)

websurfer5 pushed a commit to websurfer5/cpython that referenced this pull request Jul 20, 2020

bpo-37812: Convert CHECK_SMALL_INT macro to a function so the return …

1e7ab70

…is explicit. (pythonGH-15216)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-37812: Expand confusing CHECK_SMALL_INT so return is explicit. #15216

bpo-37812: Expand confusing CHECK_SMALL_INT so return is explicit. #15216

gnprice commented Aug 11, 2019 •

edited by bedevere-bot

Loading

benjaminp Aug 13, 2019

gnprice Aug 13, 2019

benjaminp Aug 14, 2019

rhettinger Aug 22, 2019

rhettinger Aug 22, 2019

gnprice Aug 24, 2019

rhettinger Aug 22, 2019

bedevere-bot commented Aug 22, 2019

gnprice commented Aug 24, 2019

bedevere-bot commented Aug 24, 2019

ghost commented Aug 27, 2019

gnprice commented Aug 27, 2019

ghost commented Aug 27, 2019

ghost commented Aug 27, 2019

gnprice commented Aug 27, 2019

ghost commented Aug 28, 2019 •

edited by ghost

Loading

gnprice commented Aug 28, 2019

ghost commented Sep 3, 2019

bpo-37812: Expand confusing CHECK_SMALL_INT so return is explicit. #15216

bpo-37812: Expand confusing CHECK_SMALL_INT so return is explicit. #15216

Conversation

gnprice commented Aug 11, 2019 • edited by bedevere-bot Loading

benjaminp Aug 13, 2019

Choose a reason for hiding this comment

gnprice Aug 13, 2019

Choose a reason for hiding this comment

benjaminp Aug 14, 2019

Choose a reason for hiding this comment

rhettinger Aug 22, 2019

Choose a reason for hiding this comment

rhettinger Aug 22, 2019

Choose a reason for hiding this comment

gnprice Aug 24, 2019

Choose a reason for hiding this comment

rhettinger Aug 22, 2019

Choose a reason for hiding this comment

bedevere-bot commented Aug 22, 2019

gnprice commented Aug 24, 2019

bedevere-bot commented Aug 24, 2019

ghost commented Aug 27, 2019

gnprice commented Aug 27, 2019

ghost commented Aug 27, 2019

ghost commented Aug 27, 2019

gnprice commented Aug 27, 2019

ghost commented Aug 28, 2019 • edited by ghost Loading

gnprice commented Aug 28, 2019

ghost commented Sep 3, 2019

gnprice commented Aug 11, 2019 •

edited by bedevere-bot

Loading

ghost commented Aug 28, 2019 •

edited by ghost

Loading