Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draft: Store integers in ob_size field of PyLongObjects #31595

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

lpereira
Copy link
Contributor

Changes the representation of long objects so that the ob_digit field isn't used for integers that would fit in the ob_size field. This reduces the amount of memory used by long objects, and increases the range of calculations that can be performed directly by the CPU, without requiring our big num implementation.

This is my go at implementing this optimization suggested by @markshannon.

The way this is done is by reserving one bit (the least significant bit) from the ob_size field as a flag that tells if we're using the inlined representation or not.

If the bit is set, the value is inlined; shifting right by one yields an integer that can be used directly. On 64-bit architectures, this means that values are effectively 62-bit, as one is still used for the signal. On 32-bit architectures, this is 30 bit, the same as the largest digit size we support currently.

If the bit is unset, the value is stored as a big num representation: the sign bit is still the sign, and the absolute value is the number of 15- or 30-bit digits in a PyLong object.

More details can be found in longintrepr.h.

This is not finished yet, and I'm posting the code as-is to gather feedback before going any further.

This implements _Py_bit_length() in function of either _Py_bit_length32()
or _Py_bit_length64().  This is needed for some of the work in inlining
integers that would fit in a PyVarObject's ob_size field.

Also cleans up the implementation slightly.
This uses the GCC/Clang builtin to check if either a 32-bit or a 64-bit
addition would overflow, and a fast software-only check everywhere else.
This uses the GCC/Clang builtin to check if either a 32-bit or a 64-bit
subtraction would underflow, and a fast software-only check everywhere else.
…rflow

This uses the GCC/Clang builtin to check if either a 32-bit or a 64-bit
multiplication would overflow, and a fast software-only check everywhere
else.
This tries to use GCC/Clang builtins when available, and falls back
to using the 32-bit popcount twice.
Changes the representation of long objects so that the ob_digit field
isn't used for integers that would fit in the ob_size field.  This
reduces the amount of memory used by long objects, and increases the
range of calculations that can be performed directly by the CPU, without
requiring our big num implementation.

The way this is done is by reserving one bit (the least significant bit)
from the ob_size field as a flag that tells if we're using the inlined
representation or not.

If the bit is set, the value is inlined; shifting right by one yields an
integer that can be used directly.  On 64-bit architectures, this means that
values are effectively 62-bit, as one is still used for the signal.  On
32-bit architectures, this is 30 bit, the same as the largest digit size we
support currently.

If the bit is unset, the value is stored as a big num representation: the
sign bit is still the sign, and the absolute value is the number of 15- or
30-bit digits in a PyLong object.

More details can be found in longintrepr.h.

This is not finished yet, and I'm posting the code as-is to gather feedback
before going any further.
@the-knights-who-say-ni

This comment was marked as outdated.

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to compile and found a bunch of trivial errors. Mind fixing those?

Comment on lines +81 to +83
* These integers have a capacity of 62bits on 64-bit architectures:
one bit for the "is inlined" flag, and one sign bit. This is 30 bits on
32-bit architectures (for the same reasons).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that be called a capacity of 63 (or 31) bits? When we use the full 64-bit word we don't say that the capacity is 63 bits plus sign, we usually just say "64-bit signed integers".

if (sizeof(long) == sizeof(uint64_t)) {
return _Py_popcount64(x);
}
_Py_UNREACHABLE();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual macro has no leading _.

return 0;
}
// __builtin_clzl() is undefined for x = 0.
Py_BUILT_ASSERT(sizeof(long) <= sizeof(uint32_t));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean >=?


static inline int _Py_bit_length(unsigned long x)
{
_Py_BUILD_ASSERT(sizeof(x) == sizeof(uint32_t) || sizeof(x) == sizeof(uint64_t));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No leading _.

_Py_UNREACHABLE();
}

static inline bool _Py_add_overflow32(int32_t a, int32_t b, int32_t *result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file needs #include <stdbool.h> somewhere near the top (say at line 19).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also our C style guide requires a line break between the return type and the function name.)

assert(IS_MEDIUM_VALUE(x));
return ((stwodigits)Py_SIZE(x)) * x->ob_digit[0];
/* 1 bit for the sign bit, 1 bit for the "is inlined or bignum" flag */
return (Py_size_t)size < 1UL<<(sizeof(size) * CHAR_BIT - 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no Py_size_t -- just use size_t.

Py_LOCAL_INLINE(bool)
as_inlined_int(const PyLongObject *x, Py_ssize_t *value)
{
*value = (Py_ssize_t)((Py_size_t)x->ob_size >> 1UL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately, x->ob_size doesn't exist; use Py_SIZE(x).

Py_DECREF(v);
return (PyLongObject *)get_small_int((sdigit)ival);
return get_small_int(value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need (PyLongObject *) cast.

* checks that value is larger than MAX_LONG_DIGITS. */
memcpy(result->ob_digit, src->ob_digit, sizeof(sdigit) * value);

return result;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need (PyObject *) cast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of these little things will be ironed out whenever the compiler yells at me. I still haven't compiled this, so I'm sure I'll hear a lot whenever I'm ready to do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, just understand that I find it easier to review when there aren't little details that could be fixed by trying to compile and fixing what it finds. :-)

}
} else {
assert(!is_inlined);
*x_p = (PyLongObject *)_PyLongFromSignedSize(-value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_PyLong_FromSignedSize (_ after _PyLong) here and a few times below.

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple more. (I'm sending these as I come across them, I'm reviewing this in dribs and drabs between other activities. :-)

/* Long Integer Representation
---------------------------

There are two representations of long objects: the inlined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically the optimized representation isn't "inlined" -- I would think that that term might be reserved for a version using tagged pointer values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change this to "small num" and "big num", or something like that, to make this clearer.


For inlined longs, their value can be obtained with this expression:

ob_size >> 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also document the macro one is suppsed to use. :-)

Comment on lines +97 to +98
* These numbers can also be normalized. In a normalized number,
ob_digit[abs(ob_size)-1] (the most significant digit) is never zero.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there also a stronger guarantee that when the object leaves longobject.c it is always normalized? (I.e. unnormalized longs only exist as intermediary results.)

Comment on lines +108 to +109
aware that ints abuse ob_size's sign bit and its least significant
bit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather say that they abuse the ob_size value. :-)

Comment on lines +213 to +215
int upper_bits = _Py_bit_length32((uint32_t)(x >> 32));
int lower_bits = _Py_bit_length32((uint32_t)x);
return upper_bits + lower_bits;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't look right? It should use the bit length of the upper bits, plus 32, if the upper bits are nonzero, else the bit length of the lower bits, or something like that.

(I wonder if there should be a mode where you don't use the GCC/clang/MSVC versions for any of these functions, to test that the fallback versions are correct.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Good catch. I was really tired when I wrote this code and ended up not testing/proving this is correct.

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flushing a few comments I had pending. I will wait with more review activities until you've decided how to address the issues Mark brought up. (If you want me to review more, please point me to something specific and I'll take a look, of course.)

_Py_UNREACHABLE();
}

static inline bool _Py_add_overflow32(int32_t a, int32_t b, int32_t *result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also our C style guide requires a line break between the return type and the function name.)

Comment on lines +50 to +52
Py_ssize_t value;
(void)as_inlined_int(x, &value);
return value == 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this just check whether the size is zero? I.e. return Py_SIZE(x) == 0;

(Same for is_negative() below.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see that working, because the LSB is used as a flag to distinguish between inlined numbers and big numbers.

What could be done, though, to avoid shifting, is doing a unsigned, lesser-than-or-equal-to comparison against 1ULL. If the size is either 1 (inlined number with LSB set) or 0 (big number with LSB unset), then the number is certainly zero.

Py_LOCAL_INLINE(Py_ssize_t)
encode_size(Py_ssize_t value)
{
assert(!fits_in_size_field(value));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this assert reversed? Assuming the argument is the number of digits, it had better fit in the size field. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is used to encode the number of digits for big nums. It's not supposed to be used when we're encoding the value that will be used in ob_size for inlined ints (they have the LSB set).

Maybe the function has to have a better name to avoid this confusion.

{
Py_ssize_t value;

if (x && as_inlined_int(x, &value)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why check for x != NULL?

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity. If the CLA is not signed within 14 days, it will be closed. See also https://devguide.python.org/pullrequest/#licensing

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Mar 31, 2022
@ambv ambv closed this Apr 10, 2022
@ambv ambv reopened this Apr 10, 2022
@cpython-cla-bot
Copy link

Commit authors are required to sign the Contributor License Agreement.
CLA not signed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review stale Stale PR or inactive for long period of time.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants