Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Math hypot exactfloat fastpath #8949

Merged
merged 2 commits into from
Aug 27, 2018

Conversation

rhettinger
Copy link
Contributor

Provide a fast path for the common case of exact float inputs. Saves the overhead of an external function call and of the x == 1.0 error check. Allows the inner loops to mostly use registers.

Speeds-up the overall function by approximately 25%:

$ ------ baseline -------
$ py -m timeit -r7 -s 'from math import dist; p=tuple(map(float, range(20))); q=tuple(reversed(p))' 'dist(p, q)'
1000000 loops, best of 7: 297 nsec per loop

$ ------ patched -------
$ py -m timeit -r7 -s 'from math import dist; p=tuple(map(float, range(20))); q=tuple(reversed(p))' 'dist(p, q)'
1000000 loops, best of 7: 215 nsec per loop

Disassembly of math_hypot() using GCC 8.2 shows a very tight inner loop without unnecessary register spills and reloads and without external calls that have to save and restore registers:

L378:
xorl	%eax, %eax
andpd	lC5(%rip), %xmm0         # x = fabs(x);
ucomisd	%xmm0, %xmm0
movl	$1, %ecx
setne	%al
cmovp	%ecx, %eax
orl	%eax, %ebx               # found_nan |= Py_IS_NAN(x);
L377:
movsd	%xmm0, (%r12,%r15,8)     # coordinates[i] = x;
maxsd	%xmm1, %xmm0             # if (x > max) { max = x; }
addq	$1, %r15                 # i++
cmpq	%r15, %rbp               # i < n
movapd	%xmm0, %xmm1
jle	L418
L385:
movq	24(%r13,%r15,8), %rdi    # item = PyTuple_GET_ITEM(args, i);
cmpq	%r14, 8(%rdi)            # if (PyFloat_CheckExact(item))
jne	L375
movsd	16(%rdi), %xmm0          # x = PyFloat_AS_DOUBLE(item)
jmp	L378

Saves function call overhead and lets inner-loop be performed
in registers with no spills/reloads.
@rhettinger rhettinger added performance Performance or resource usage skip issue skip news labels Aug 27, 2018
@rhettinger rhettinger merged commit 74734f7 into python:master Aug 27, 2018
@rhettinger rhettinger deleted the math-hypot-exactfloat-fastpath branch August 27, 2018 00:38
CuriousLearner added a commit to CuriousLearner/cpython that referenced this pull request Aug 27, 2018
* master: (104 commits)
  Fast path for exact floats in math.hypot() and math.dist() (pythonGH-8949)
  Remove AIX workaround test_subprocess (pythonGH-8939)
  bpo-34503: Fix refleak in PyErr_SetObject() (pythonGH-8934)
  closes bpo-34504: Remove the useless NULL check in PySequence_Check(). (pythonGH-8935)
  closes bpo-34501: PyType_FromSpecWithBases: Check spec->name before dereferencing it. (pythonGH-8930)
  closes bpo-34502: Remove a note about utf8_mode from sys.exit() docs. (pythonGH-8928)
  Remove unneeded PyErr_Clear() in _winapi_SetNamedPipeHandleState_impl() (pythonGH-8281)
  Fix markup in stdtypes documentation (pythonGH-8905)
  bpo-34395: Don't free allocated memory on realloc fail in load_mark() in _pickle.c. (pythonGH-8788)
  Fix upsizing of marks stack in pickle module. (pythonGH-8860)
  bpo-34171: Prevent creating Lib/trace.cover when run the trace module. (pythonGH-8841)
  closes bpo-34493: Objects/genobject.c: Add missing NULL check to compute_cr_origin() (pythonGH-8911)
  Fixed typo with asynccontextmanager code example (pythonGH-8845)
  bpo-34426: fix typo (__lltrace__ -> __ltrace__) (pythonGH-8822)
  bpo-13312: Avoid int underflow in time year. (pythonGH-8912)
  bpo-34492: Python/coreconfig.c: Fix _Py_wstrlist_copy() (pythonGH-8910)
  bpo-34448: Improve output of usable wchar_t check (pythonGH-8846)
  closes bpo-34471: _datetime: Add missing NULL check to tzinfo_from_isoformat_results. (pythonGH-8869)
  bpo-6700: Fix inspect.getsourcelines for module level frames/tracebacks (pythonGH-8864)
  Fix typo in the dataclasses's doc (pythonGH-8896)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage skip issue skip news
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants