Skip to content

Specialization for accurate complex summation in sum()? #121149

Closed
@skirpichev

Description

@skirpichev

Feature or enhancement

Proposal:

Currently, sum() builtin lacks any specialization for complex numbers, yet it's usually faster than better pure-python alternatives.

benchmark sum() wrt pure-python version
# a.py
from random import random, seed
seed(1)
data = [complex(random(), random()) for _ in range(10)]
def msum(xs):
    it = iter(xs)
    res = next(it)
    for z in it:
        res += z
    return res
def sum2(xs):
    return complex(sum(_.real for _ in xs),
                   sum(_.imag for _ in xs))
$ ./python -m timeit -r11 -unsec -s 'from a import data, msum' 'sum(data)'
500000 loops, best of 11: 963 nsec per loop
$ ./python -m timeit -r11 -unsec -s 'from a import data, msum' 'msum(data)'
200000 loops, best of 11: 1.31e+03 nsec per loop

Hardly using sum() component-wise is an option:

$ ./python -m timeit -r11 -unsec -s 'from a import data, sum2' 'sum2(data)'
50000 loops, best of 11: 8.56e+03 nsec per loop

Unfortunately, direct using this builtin in numeric code doesn't make sense, as results are (usually) inaccurate. It's not too hard to do summation component-wise with math.fsum(), but it's slow and there might be a better way.

In #100425 simple algorithm, using compensated summation, was implemented in sum() for floats. I propose (1) make specialization in sum() for complex numbers, and (2) reuse #100425 code to implement accurate summation of complexes.

(1) is simple and strightforward, yet it will give some measurable performance boost

benchmark sum() in the main wrt added specialization for complex
diff --git a/Python/bltinmodule.c b/Python/bltinmodule.c
index 6e50623caf..da0eed584a 100644
--- a/Python/bltinmodule.c
+++ b/Python/bltinmodule.c
@@ -2691,6 +2691,59 @@ builtin_sum_impl(PyObject *module, PyObject *iterable, PyObject *start)
             }
         }
     }
+
+    if (PyComplex_CheckExact(result)) {
+        Py_complex c_result = PyComplex_AsCComplex(result);
+        Py_SETREF(result, NULL);
+        while(result == NULL) {
+            item = PyIter_Next(iter);
+            if (item == NULL) {
+                Py_DECREF(iter);
+                if (PyErr_Occurred())
+                    return NULL;
+                return PyComplex_FromCComplex(c_result);
+            }
+            if (PyComplex_CheckExact(item)) {
+                Py_complex x = PyComplex_AsCComplex(item);
+                c_result.real += x.real;
+                c_result.imag += x.imag;
+                _Py_DECREF_SPECIALIZED(item, _PyFloat_ExactDealloc);
+                continue;
+            }
+            if (PyLong_Check(item)) {
+                long value;
+                int overflow;
+                value = PyLong_AsLongAndOverflow(item, &overflow);
+                if (!overflow) {
+                    c_result.real += (double)value;
+                    c_result.imag += 0.0;
+                    Py_DECREF(item);
+                    continue;
+                }
+            }
+            if (PyFloat_Check(item)) {
+                double value = PyFloat_AS_DOUBLE(item);
+                c_result.real += value;
+                c_result.imag += 0.0;
+                Py_DECREF(item);
+                continue;
+            }
+            result = PyComplex_FromCComplex(c_result);
+            if (result == NULL) {
+                Py_DECREF(item);
+                Py_DECREF(iter);
+                return NULL;
+            }
+            temp = PyNumber_Add(result, item);
+            Py_DECREF(result);
+            Py_DECREF(item);
+            result = temp;
+            if (result == NULL) {
+                Py_DECREF(iter);
+                return NULL;
+            }
+        }
+    }
 #endif
 
     for(;;) {

main:

$ ./python -m timeit -r11 -unsec -s 'from a import data, msum' 'sum(data)'
500000 loops, best of 11: 963 nsec per loop

with specialization:

$ ./python -m timeit -r11 -unsec -s 'from a import data, msum' 'sum(data)'
500000 loops, best of 11: 606 nsec per loop

(2) also seems to be a no-brain task: simple refactoring of PyFloat specialization should allows us use same core for PyComplex specialization.

If there are no objections against - I'll work on a patch.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.14bugs and security fixestype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions