Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-33234 Improve list() pre-sizing for inputs with known lengths (no __length_hint__) #9846

Merged
merged 10 commits into from
Oct 28, 2018

Conversation

pablogsal
Copy link
Member

@pablogsal pablogsal commented Oct 13, 2018

This is a version of #6493 that only preallocates the list when __len__ is available to reduce memory overallocation and possible performance hit on big iterables.

This are the L1 and L2 cache misses and related information:

THIS PR

❯ perf stat -r 200 -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations ./python -c "
for _ in range(1000):
    list([0]*10000)
"
 Performance counter stats for './python -c
for _ in range(1000):
    list([0]*10000)
' (200 runs):

         1,151,365      cache-references:u                                            ( +-  0.47% )
             5,686      cache-misses:u            #    0.494 % of all cache refs      ( +-  8.06% )
       370,237,437      cycles:u                                                      ( +-  0.05% )
       542,049,270      instructions:u            #    1.46  insn per cycle           ( +-  0.00% )
       134,146,925      branches:u                                                    ( +-  0.00% )
             8,031      faults:u                                                      ( +-  0.00% )
                 0      migrations:u

          0.122170 +- 0.000307 seconds time elapsed  ( +-  0.25% )

MASTER

❯ perf stat -r 200 -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations ./python -c "for _ in range(1000):
    list([0]*10000)
"
 Performance counter stats for './python -c
for _ in range(1000):
    list([0]*10000)
' (200 runs):

         1,275,440      cache-references:u                                            ( +-  0.61% )
            21,895      cache-misses:u            #    1.717 % of all cache refs      ( +-  4.69% )
       392,293,096      cycles:u                                                      ( +-  0.22% )
       543,254,939      instructions:u            #    1.38  insn per cycle           ( +-  0.00% )
       134,549,173      branches:u                                                    ( +-  0.00% )
            10,039      faults:u                                                      ( +-  0.00% )
                 0      migrations:u

          0.133171 +- 0.000463 seconds time elapsed  ( +-  0.35% )

https://bugs.python.org/issue33234

The list() constructor isn't taking full advantage of known input
lengths or length hints. This commit makes the constructor
pre-size and not over-allocate when the input size is known or
can be reasonably estimated.

A new test is added to test_list.py and a test needed to be modified
in test_descr.py as it assumed that there is only one call to
__length_hint__() in the list constructor.
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but the NEWS entry should be fixed.

@@ -0,0 +1,2 @@
The list constructor will pre-size and not over-allocate when the input size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

length, not size.

@@ -0,0 +1,2 @@
The list constructor will pre-size and not over-allocate when the input size
is known or can be reasonably estimated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"or can be reasonably estimated" I don't understand this part.

@@ -2649,6 +2675,13 @@ list___init___impl(PyListObject *self, PyObject *iterable)
(void)_list_clear(self);
}
if (iterable != NULL) {
Py_ssize_t iter_len = PyObject_Length(iterable);
if (iter_len == -1){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: ...) { (add a space)

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@vstinner
Copy link
Member

I suggest to wait the approval of a second core dev before merging this change.

@pablogsal
Copy link
Member Author

I would like @serhiy-storchaka comment and advice on this if possible.

@vstinner
Copy link
Member

@pablogsal: If you want me to merge your PR, I would be me confident if you write a benchmark for the worst case: a small container (ex: 10 items or less) without a known length. I expect that the additional function call (obj.len()) would slow down this case. I would be interested to see the overhead.

@pablogsal
Copy link
Member Author

pablogsal commented Oct 23, 2018

@vstinner

Experiment:

import perf

runner = perf.Runner()
runner.timeit("list(x)",
              stmt="list(x)",
              setup="x = iter(list(range(10)))")

result:

 ./python.exe -m perf  compare_to old.json new.json
Mean +- std dev: [old] 795 ns +- 25 ns -> [new] 2.94 us +- 0.07 us: 3.70x slower (+270%)

With the experiment:

import perf

runner = perf.Runner()
runner.timeit("list(x)",
              stmt="list(x)",
              setup="x = [0]*10000")

the result is

❯ ./python.exe -m perf  compare_to old_big.json new_big.json
Mean +- std dev: [old_big] 97.5 us +- 3.4 us -> [new_big] 81.4 us +- 3.7 us: 1.19x faster (-5%)

I would say that this result points to abandon this version of the change, given the bad performance for small iterables.

@pablogsal
Copy link
Member Author

pablogsal commented Oct 23, 2018

This is the first experiment using PyObject_LengthHint:

❯ ./python.exe -m perf  compare_to old.json new.json
Mean +- std dev: [old] 795 ns +- 25 ns -> [new] 882 ns +- 32 ns: 1.11x slower (+11%)

@vstinner
Copy link
Member

Mean +- std dev: [old] 795 ns +- 25 ns -> [new] 2.94 us +- 0.07 us: 3.70x slower (+270%)

This overhead likely comes from the raisin an exception + PyErr_Clear(). hasattr() has been optimized by adding a _PyObject_LookupAttr() which is getattr() without raising an exception if the attribute doesn't exist.

Probing an object size is a common need for optimization to preallocate a container. Python builtin types would benefit of such function.

What do you think of adding something like _PyObject_ProbeLength() which is like PyObject_Length() without raising an exception?

cc @methane @serhiy-storchaka

@@ -2649,6 +2675,13 @@ list___init___impl(PyListObject *self, PyObject *iterable)
(void)_list_clear(self);
}
if (iterable != NULL) {
Py_ssize_t iter_len = PyObject_Length(iterable);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reduce or avoid the overhead of the worst case, you can avoid this call if _PyObject_HasLen() is false.

This change doesn't require to implement a new _PyObject_ProbeLength().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated run of first experiment using _PyObject_HasLen():

Mean +- std dev: [old] 795 ns +- 25 ns -> [new] 776 ns +- 25 ns: 1.02x faster (-6%)

@pablogsal
Copy link
Member Author

pablogsal commented Oct 23, 2018

Updated run of first experiment using _PyObject_HasLen():

❯ ./python.exe -m perf  compare_to old.json new.json
Mean +- std dev: [old] 795 ns +- 25 ns -> [new] 776 ns +- 25 ns: 1.02x faster (-6%)

@vstinner
Copy link
Member

Updated run of first experiment using _PyObject_HasLen():

If it's the "x = iter(list(range(10)))" worst case: it confirms what I expected, _PyObject_HasLen() makes the overhead on the worst case not significant. Good.

I guess that you kept the same speedup for the best case?

@@ -2649,6 +2675,15 @@ list___init___impl(PyListObject *self, PyObject *iterable)
(void)_list_clear(self);
}
if (iterable != NULL) {
if(_PyObject_HasLen(iterable)){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a space before {.

@@ -2649,6 +2675,15 @@ list___init___impl(PyListObject *self, PyObject *iterable)
(void)_list_clear(self);
}
if (iterable != NULL) {
if(_PyObject_HasLen(iterable)){
Py_ssize_t iter_len = PyObject_Length(iterable);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyObject_Length is an alias of PyObject_Size (for backward compatibility). It is better to use PyObject_Size.

allocated = 0;
}
num_allocated_bytes = allocated * sizeof(PyObject *);
items = (PyObject **)PyMem_Malloc(num_allocated_bytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would not be easier to use PyMem_New?

@pablogsal
Copy link
Member Author

pablogsal commented Oct 23, 2018

Performance metrics with commit b678442:

❯ ./python -m perf compare_to ../cpython2/old.json new.json
Mean +- std dev: [old_big] 97.5 us +- 3.4 us -> [new_big] 88.6 us +- 5.5 us: 1.1x faster (-5%)

and perf information:

❯ perf stat -r 200 -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations ./python -c "
for _ in range(1000):
    list([0]*10000)
"
 Performance counter stats for './python -c
for _ in range(1000):
    list([0]*10000)
' (200 runs):

         1,151,607      cache-references:u                                            ( +-  0.51% )
             3,438      cache-misses:u            #    0.299 % of all cache refs      ( +- 14.03% )
       379,755,424      cycles:u                                                      ( +-  0.56% )
       543,025,748      instructions:u            #    1.43  insn per cycle           ( +-  0.00% )
       134,399,489      branches:u                                                    ( +-  0.00% )
             8,029      faults:u                                                      ( +-  0.00% )
                 0      migrations:u

          0.123801 +- 0.000686 seconds time elapsed  ( +-  0.55% )

Also, let's not forget the original objective of this patch:

PATCH

>>> import sys
>>> x = list([0]*1000000)
>>>
KeyboardInterrupt
>>> sys.getsizeof(x)
8000080

MASTER

>>> import sys
>>> x = list([0]*1000000)
>>>
KeyboardInterrupt
>>> sys.getsizeof(x)
9000120

This is a 12 % improvement in memory claimed by the list in these cases.

Copy link
Member

@methane methane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, it's now even better with _PyObject_HasLen()!

if (iter_len == -1) {
if (PyErr_ExceptionMatches(PyExc_Exception)) {
PyErr_Clear();
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: PEP 7 requires

}
else {

if (_PyObject_HasLen(iterable) && self->ob_item == NULL) {
Py_ssize_t iter_len = PyObject_Size(iterable);
if (iter_len == -1) {
if (PyErr_ExceptionMatches(PyExc_Exception)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What tests will fail if not silence any exceptions at all?

Copy link
Member Author

@pablogsal pablogsal Oct 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least the following:

test_generators failed
**********************************************************************
File "/home/pablogsal/github/cpython/Lib/test/test_generators.py", line ?, in test.test_generators.__test__.weakref
Failed example:
    list(p)
Exception raised:
    Traceback (most recent call last):
      File "/home/pablogsal/github/cpython/Lib/doctest.py", line 1329, in __run
        compileflags, 1), test.globs)
      File "<doctest test.test_generators.__test__.weakref[9]>", line 1, in <module>
        list(p)
    TypeError: object of type 'generator' has no len()
**********************************************************************
1 items had failures:
   1 of  10 in test.test_generators.__test__.weakref
***Test Failed*** 1 failures.


test_genexps failed
**********************************************************************
File "/home/pablogsal/github/cpython/Lib/test/test_genexps.py", line ?, in test.test_genexps.__test__.doctests
Failed example:
    list(p)
Exception raised:
    Traceback (most recent call last):
      File "/home/pablogsal/github/cpython/Lib/doctest.py", line 1329, in __run
        compileflags, 1), test.globs)
      File "<doctest test.test_genexps.__test__.doctests[75]>", line 1, in <module>
        list(p)
    TypeError: object of type 'generator' has no len()
**********************************************************************
1 items had failures:
   1 of  76 in test.test_genexps.__test__.doctests
***Test Failed*** 1 failures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can _PyObject_HasLen() return true for generators?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it is not true generator, it is a weakref. What if silence just TypeError?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit afraid if someone has similar tests/usage like these two cases and then they will receive new errors as raising all exceptions would be technically a backwards incompatible change. But if you think we should raise (maybe ignoring TypeError) I am happy to change the implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See PyObject_LengthHint().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 0a6c8ba

if (PyErr_ExceptionMatches(PyExc_Exception)) {
PyErr_Clear();
}
else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: for readability, I suggest to invert the if condition and exchange the true/false blocks, and move PyErr_Clear() after the if:

if (!not exception) { return -1; }
PyErr_Clear();

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

if (!PyErr_ExceptionMatches(PyExc_TypeError)) {
return -1;
}
PyErr_Clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pablogsal, @serhiy-storchaka: In a previous comment, I proposed to add an helper function to "probe" an object size: so move this code into a private helper function. Since the same code is used by PyObject_LengthHint(), it would now make sense, no?

See also iter_len() ... which is different.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code will be used just in two places and is not too complex. Adding yet one intermediate function will add a performance penalty for calling a function and checking its result, and will complicate the code. If this code will be used in more places, it can be refactored.

@pablogsal pablogsal merged commit 372d705 into python:master Oct 28, 2018
@pablogsal pablogsal deleted the bpo33234-2 branch October 28, 2018 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants