Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF/CLN: see what we can use from offsets.pyx #11214

Closed
jreback opened this issue Oct 1, 2015 · 7 comments
Closed

PERF/CLN: see what we can use from offsets.pyx #11214

jreback opened this issue Oct 1, 2015 · 7 comments
Labels
Clean Frequency DateOffsets Performance Memory or execution speed performance

Comments

@jreback
Copy link
Contributor

jreback commented Oct 1, 2015

xref #11205

https://github.com/pydata/pandas/blob/master/pandas/src/offsets.pyx

is in the repo but is not included in the build, nor updated in 2+ years.

Looks like their might be some cython code to speed up DateOffset apply in the non-vectorized cases.
Further might look to move some routines from tslib.pyx for these types of things (and obviously would need to be included in the build).

@jreback jreback added Performance Memory or execution speed performance Frequency DateOffsets Clean labels Oct 1, 2015
@jreback jreback added this to the 0.17.1 milestone Oct 1, 2015
@jreback
Copy link
Contributor Author

jreback commented Oct 1, 2015

cc @chris-b1
cc @cancan101
cc @sinhrks

@sinhrks
Copy link
Member

sinhrks commented Oct 1, 2015

Also looks nice place to put some cythonized codes from offsets.py and frequencies.py.

@chris-b1
Copy link
Contributor

I didn't find a whole lot useful in the existing code, it's a slightly different definition of an offset, basically stored as a day ordinal from what I can tell.

I did scratch out what the current DateOffset class might look like as a cython extension type - https://github.com/chris-b1/pandas/tree/cythonize-offset - it could improve performance quite a bit in the non-vectorized case (and remove the dual definition needed for the vectorized case). Also could generate ranges faster.

But it'd take quite a bit of work to actually port everything over in a feature-complete way, and would have worry about things like pickle compat. So maybe makes more sense to define some targeted helper cython functions and attach to the existing classes or something?

In [1]: from pandas.tseries.offsets import DateOffset, DateOffset2
   ...: old_do = DateOffset(months=1)
   ...: new_do =  DateOffset2(months=1)

In [2]: ts = pd.Timestamp('2014-1-1')

In [3]: %%timeit
   ...: for _ in xrange(10000):
   ...:     old_do.apply(ts)
10 loops, best of 3: 143 ms per loop

In [4]: %%timeit
   ...: for _ in xrange(10000):
   ...:     new_do.apply(ts)
100 loops, best of 3: 13.5 ms per loop

In [5]: dti = pd.date_range('1900-1-1', periods=10000)

In [6]: %timeit old_do.apply_index(dti)
1000 loops, best of 3: 1 ms per loop

In [7]: %timeit new_do.apply_index(dti)
1000 loops, best of 3: 1.02 ms per loop

@jreback
Copy link
Contributor Author

jreback commented Oct 14, 2015

@chris-b1 right, your helpers could be used in cases when advancing more than 1 date rather than doing it in a python loop.

so will repurpose this issue.

if you have a chance can you put up a list of things that should be targeted and I will put up checkboxes.

@jbrockmendel
Copy link
Member

Pretty sure this can be closed; the referenced file is gone.

@jorisvandenbossche jorisvandenbossche modified the milestones: Interesting Issues, No action Nov 10, 2017
@jorisvandenbossche
Copy link
Member

@jbrockmendel Can you point to the commit where it was removed/moved?

@jbrockmendel
Copy link
Member

#17585

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Frequency DateOffsets Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

5 participants