Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

offsets pickle issues #17313

Closed
jbrockmendel opened this issue Aug 22, 2017 · 7 comments
Closed

offsets pickle issues #17313

jbrockmendel opened this issue Aug 22, 2017 · 7 comments
Labels
Compat pandas objects compatability with Numpy or Python functions

Comments

@jbrockmendel
Copy link
Member

I'm overhauling tseries.offsets and finding a handful of workarounds that are in place "for prior pickles". How important are these? This process would be appreciably simpler if I could ignore those.

Related:

class MonthOffset(SingleConstructorOffset):
    _adjust_dst = True

    @property
    def name(self):
        if self.isAnchored:
            return self.rule_code
        else:
            return "%s-%s" % (self.rule_code, _int_to_month[self.n])

isAnchored is a method, not a property, so this is probably intended to be if self.isAnchored().

@chris-b1
Copy link
Contributor

We won't break pickle compat, but do have a compatibility layer in read_pickle, so if something needs to change you can likely handle it there.

https://github.com/pandas-dev/pandas/blob/2bec750b21b8715e3f55e71a6c69f2abef54d08b/pandas/compat/pickle_compat.py

@gfyoung gfyoung added the Compat pandas objects compatability with Numpy or Python functions label Aug 22, 2017
@jbrockmendel
Copy link
Member Author

We won't break pickle compat, but do have a compatibility layer in read_pickle

I'm trying to move parts of DateOffset and Tick into cython, getting two pickle-related errors repeatedly:

pandas/compat/pickle_compat.py:22: in load_reduce
    stack[-1] = func(*args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   TypeError: pandas._libs.tslibs.offsets._DateOffset.__new__(BusinessDay) is not safe, use object.__new__()
/usr/lib/python2.7/pickle.py:1223: in load_build
    setstate(state)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   TypeError: Expected tuple, got dict

Googling and poking around the other pyx files suggest that implementing __reduce__ or __reduce_ex__ is the fix, but I haven't had much luck there. Have any experience with this?

@chris-b1
Copy link
Contributor

Sorry, I'm no help here (pickle) - I've actually started down this path before (cythonizing offsets) and pickle compat is part of the reason I never put a lot of effort into it. Also I was able to sufficiently speed up the vectorized offsets to the point where cythonizing doesn't add much, xref #11214

Possible can look at some of the early major refactoring for inspiration of how pickle compat was handled. https://github.com/pandas-dev/pandas/pull/3482/files

@jbrockmendel
Copy link
Member Author

Also I was able to sufficiently speed up the vectorized offsets to the point where cythonizing doesn't add much

I'll take a look at that. The performance pain point I'm trying to address is actually in DateOffset.__eq__ (which gets called by Period.__eq__). DateOffset.__eq__ calls DateOffset._params, which is super-slow. I've got a PR to cache it, but that's a half-measure.

@jreback
Copy link
Contributor

jreback commented Aug 23, 2017

we support pickles back to 0.13. Certainly could move that up in this version. but again that should be a completely independent PR.

@jreback jreback closed this as completed Aug 23, 2017
@jbrockmendel
Copy link
Member Author

OK. The particular test that I couldn't get working was specific to tests.tseries.test_offsets.TestCustomBusinessDay.test_pickle_compat_0_14_1, i.e. 0.14.1. The exception message looked like it might have something to do with new cython rules. Anyway, that won't be an issue again until several steps down the sequence started in #17318.

@max-sixty
Copy link
Contributor

@jbrockmendel there's definitely some cleaning up of pickling to be done - I had a go at moving PeriodIndex to use only __getstate__ (which I believe is possible although I think @jreback was more skeptical). I didn't manage to finish the job - if you'd take the torch I think there's some good cleanup to be done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

No branches or pull requests

5 participants