Skip to content

CLN: use float64_t consistently instead of double, double_t #23583

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Nov 11, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
remove unncessary cpdef
  • Loading branch information
jbrockmendel committed Nov 8, 2018
commit 03ae9cfb5e793cd2a7d1ec18514dd5750556b1aa
3 changes: 0 additions & 3 deletions pandas/_libs/algos.pxd
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
from util cimport numeric


cpdef numeric kth_smallest(numeric[:] a, Py_ssize_t k) nogil


cdef inline Py_ssize_t swap(numeric *a, numeric *b) nogil:
cdef:
numeric t
Expand Down
3 changes: 2 additions & 1 deletion pandas/_libs/algos.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -195,9 +195,10 @@ def groupsort_indexer(ndarray[int64_t] index, Py_ssize_t ngroups):
return result, counts


# TODO: redundant with groupby.kth_smallest_c
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not actually

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. The typing is more specific in the groupby version, but the code itself is nearly identical

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to remove and you will see why it’s here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll get rid of the comment, but am kind of surprised: you're usually Holy Crusader against redundant code, and the body of this function is very nearly copy/pasted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my point is i don’t think it’s easy to remove
you are more than welcome to try

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No you're right. I removed the comment.

@cython.boundscheck(False)
@cython.wraparound(False)
cpdef numeric kth_smallest(numeric[:] a, Py_ssize_t k) nogil:
def kth_smallest(numeric[:] a, Py_ssize_t k) -> numeric:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think will have a big perf slowdown

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its never used in cython and nogil isnt allowed for def functions. There is a with nogil block just below this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls test i once changed this (and tried to remove) and was all negative

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indistinguishable:

master:

In [3]: arr = np.arange(10000, dtype=np.int64)

In [4]: %timeit pd._libs.algos.kth_smallest(arr, 4)
The slowest run took 165.65 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.42 µs per loop

In [5]: %timeit pd._libs.algos.kth_smallest(arr, 4)
The slowest run took 4.39 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 10.3 µs per loop

In [6]: %timeit pd._libs.algos.kth_smallest(arr, 4)
The slowest run took 4.21 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 11.2 µs per loop

PR:

In [3]: arr = np.arange(10000, dtype=np.int64)

In [4]: %timeit pd._libs.algos.kth_smallest(arr, 4)
The slowest run took 12.06 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.71 µs per loop

In [5]: %timeit pd._libs.algos.kth_smallest(arr, 4)
The slowest run took 9.48 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.6 µs per loop

In [6]: %timeit pd._libs.algos.kth_smallest(arr, 4)
The slowest run took 6.23 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.95 µs per loop

Similar for other dtypes

cdef:
Py_ssize_t i, j, l, m, n = a.shape[0]
numeric x
Expand Down
2 changes: 1 addition & 1 deletion pandas/_libs/groupby.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ cdef inline float64_t median_linear(float64_t* a, int n) nogil:
return result


# TODO: Is this redundant with algos.kth_smallest?
# TODO: Is this redundant with algos.kth_smallest
cdef inline float64_t kth_smallest_c(float64_t* a,
Py_ssize_t k,
Py_ssize_t n) nogil:
Expand Down