Description
Feature or enhancement
Proposal:
Currently, sum() builtin lacks any specialization for complex numbers, yet it's usually faster than better pure-python alternatives.
benchmark sum() wrt pure-python version
# a.py
from random import random, seed
seed(1)
data = [complex(random(), random()) for _ in range(10)]
def msum(xs):
it = iter(xs)
res = next(it)
for z in it:
res += z
return res
def sum2(xs):
return complex(sum(_.real for _ in xs),
sum(_.imag for _ in xs))
$ ./python -m timeit -r11 -unsec -s 'from a import data, msum' 'sum(data)'
500000 loops, best of 11: 963 nsec per loop
$ ./python -m timeit -r11 -unsec -s 'from a import data, msum' 'msum(data)'
200000 loops, best of 11: 1.31e+03 nsec per loop
Hardly using sum() component-wise is an option:
$ ./python -m timeit -r11 -unsec -s 'from a import data, sum2' 'sum2(data)'
50000 loops, best of 11: 8.56e+03 nsec per loop
Unfortunately, direct using this builtin in numeric code doesn't make sense, as results are (usually) inaccurate. It's not too hard to do summation component-wise with math.fsum(), but it's slow and there might be a better way.
In #100425 simple algorithm, using compensated summation, was implemented in sum() for floats. I propose (1) make specialization in sum() for complex numbers, and (2) reuse #100425 code to implement accurate summation of complexes.
(1) is simple and strightforward, yet it will give some measurable performance boost
benchmark sum() in the main wrt added specialization for complex
diff --git a/Python/bltinmodule.c b/Python/bltinmodule.c
index 6e50623caf..da0eed584a 100644
--- a/Python/bltinmodule.c
+++ b/Python/bltinmodule.c
@@ -2691,6 +2691,59 @@ builtin_sum_impl(PyObject *module, PyObject *iterable, PyObject *start)
}
}
}
+
+ if (PyComplex_CheckExact(result)) {
+ Py_complex c_result = PyComplex_AsCComplex(result);
+ Py_SETREF(result, NULL);
+ while(result == NULL) {
+ item = PyIter_Next(iter);
+ if (item == NULL) {
+ Py_DECREF(iter);
+ if (PyErr_Occurred())
+ return NULL;
+ return PyComplex_FromCComplex(c_result);
+ }
+ if (PyComplex_CheckExact(item)) {
+ Py_complex x = PyComplex_AsCComplex(item);
+ c_result.real += x.real;
+ c_result.imag += x.imag;
+ _Py_DECREF_SPECIALIZED(item, _PyFloat_ExactDealloc);
+ continue;
+ }
+ if (PyLong_Check(item)) {
+ long value;
+ int overflow;
+ value = PyLong_AsLongAndOverflow(item, &overflow);
+ if (!overflow) {
+ c_result.real += (double)value;
+ c_result.imag += 0.0;
+ Py_DECREF(item);
+ continue;
+ }
+ }
+ if (PyFloat_Check(item)) {
+ double value = PyFloat_AS_DOUBLE(item);
+ c_result.real += value;
+ c_result.imag += 0.0;
+ Py_DECREF(item);
+ continue;
+ }
+ result = PyComplex_FromCComplex(c_result);
+ if (result == NULL) {
+ Py_DECREF(item);
+ Py_DECREF(iter);
+ return NULL;
+ }
+ temp = PyNumber_Add(result, item);
+ Py_DECREF(result);
+ Py_DECREF(item);
+ result = temp;
+ if (result == NULL) {
+ Py_DECREF(iter);
+ return NULL;
+ }
+ }
+ }
#endif
for(;;) {
main:
$ ./python -m timeit -r11 -unsec -s 'from a import data, msum' 'sum(data)'
500000 loops, best of 11: 963 nsec per loop
with specialization:
$ ./python -m timeit -r11 -unsec -s 'from a import data, msum' 'sum(data)'
500000 loops, best of 11: 606 nsec per loop
(2) also seems to be a no-brain task: simple refactoring of PyFloat specialization should allows us use same core for PyComplex specialization.
If there are no objections against - I'll work on a patch.
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response