Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inplace returns self? #1893

Closed
jseabold opened this issue Sep 11, 2012 · 52 comments
Closed

Inplace returns self? #1893

jseabold opened this issue Sep 11, 2012 · 52 comments
Milestone

Comments

@jseabold
Copy link
Contributor

Is there any reason that whenever I'm doing inplace=True I always get self returned? Obviously not a huge deal, but this is kind of wart-y IMO. I wouldn't expect inplace to return anything.

@changhiskhan
Copy link
Contributor

Good idea actually. Useful to make the user more explicit about invoking side-effects.

@wesm
Copy link
Member

wesm commented Sep 13, 2012

I go back and forth on it. At some point I'd decided it was better to have a consistent API w.r.t. return values (vs. None in the inplace=True case), e.g. whether or not inplace=True, you can always count on getting a reference back to the modified object.

@jseabold
Copy link
Contributor Author

Sure, but this is different than the in-place operations in python and in numpy. When I was first using pandas, I was really thrown by the default, make copies everywhere and return an object, especially when python and numpy lead me to expect in-place operation (e.g., sort, assigment to a view). I was bit by not catching this often (kind of like mpl returns things that you don't really need). Then I discovered the inplace keyword. Great, because I almost always want to do things in-place and avoid all the extra typing of assignment, though I have to use the keyword now everywhere. Just seems unnecessary to return the object since I explicitly asked for inplace and I already have the reference to it. Just noise at the interpreter when working interactively.

@jseabold
Copy link
Contributor Author

Just as a follow-up, when writing notebooks I have to put semi-colons after every line where I do in-place operations so it doesn't barf the returned self.

@michaelaye
Copy link
Contributor

Wes, your argument confuses me. Do you consider the inplace option a special case or not? Also, I just learned that ipython keeps unassigned memory objects alive for the history (the _ thingie). Is this true for notebooks as well and could this be an argument for really not returning an object when doing things inplace?

@wesm
Copy link
Member

wesm commented Dec 3, 2012

I think the proposal on the table is to always return None when using inplace=True. Moving this to 0.10

@ghost
Copy link

ghost commented Dec 3, 2012

Note that not returning self breaks the fluent interface: a.opA().opB().opC()

@jseabold
Copy link
Contributor Author

jseabold commented Dec 3, 2012

Well, inplace is optional, so the easy solution is don't use inplace if you want to in turn do an operation on the returned object.

@wesm
Copy link
Member

wesm commented Dec 3, 2012

Another place where Python's eager evaluation can be a weakness

@ghost
Copy link

ghost commented Dec 3, 2012

those are two orthogonal considerations. one (arguably) should not force the other.

@jseabold
Copy link
Contributor Author

jseabold commented Dec 3, 2012

Fair enough, but pandas is still an outlier in this respect compared to many of the methods in numpy and python itself. Admittedly, my argument for a (default) inplace not returning self is because I want to save myself typing and improve readability of output at the interpreter for teaching, presenting, or demonstrating. I don't often have serious concerns about memory use and readability of scripts.

What's the alternative? Having options where inplace can be 'true', 'false', or 'return'?

@wesm
Copy link
Member

wesm commented Dec 3, 2012

Or you could provide some kind of chainable, inplace interface. I'm thinking a lot lately about building a DSL layer around pandas so you could do things like:

frame do {
    .dropna axis=1
    ab_diff = a - b       
} group by key1 key2  {
    max(ab_diff)
    std(a)
}

and have that be as fast and memory-efficient as possible. And then you could easily chain "in place operations" and get what you expect

@michaelaye
Copy link
Contributor

I fail to see the 'orthogonality' (maybe because 'orthogonal' is linguistically overrated, IMHO). Your claim that these design questions would be independent (i.e. 'orthogonal'), supports the use of something like obj.opA().obB(inplace=True).obC(). I even don't want to start thinking about what I just did there and which of all the objects flying around has what content now. The cleanest interface for me is: When I do inplace, it effects my original object, if not, it is safe.

@ghost
Copy link

ghost commented Dec 3, 2012

  • Actually, the result returned from the code you quoted would contain the expected result of those operations.
  • since you didn't use inplace in the first operation, all the interim results should be GC'd anyway.
  • Actually, this could be a useful idiom:
obj.expensive_op(inplace=true).ViewOp1()
obj.ViewOp2()

is not that bad IMHO.

  • IPython's history behaviour is a tooling issue, and can probably be disabled when needed.
  • Seems to me that the reason you don't want to think about what you did there is mostly
    because it's clear to you that you might be misusing the API.

@michaelaye
Copy link
Contributor

Seems to me that the reason you don't want to think about what you did there is mostly because it's clear to you that you might be misusing the API.

Sure. But I understood you in the way that this kind of usage is what you suggest should be allowed? I'm sorry if I still misunderstand... Or maybe you are actually saying, even so it's powerful and harder to understand, it still should be allowed and possible?

@ghost
Copy link

ghost commented Dec 3, 2012

I think:

  1. powerful AND harder to understand is not a bad thing. I agree that I would not consider that
    common usage, just acceptable usage.
  2. Agreed that inplace=True with None as return value makes the semantics clearer.
  3. I do not agree that 2) justifies giving up the convenience of fluent interfaces, which ofcourse
    are useful when all your ops are inplace=True, as well.
  4. chaining inplace=True and regular operations is specialized enough that I suspect
    someone who would use it would not be confused by it, but actually be a power-user
    maxing-out.
  5. unless the first op is inplace=True, you won't even clobber anything your code already
    has a reference to.
  6. I believe the issues raised are real concerns, which can be solved by other means however, without
    giving up the fluent interface.

@michaelaye
Copy link
Contributor

About GC:
If I were to do a _ = obj.op(inplace=True) just to avoid the printout messing up my notebook, the GC won't work. Is there an alternative to that? Ah, but that's not a new object, so this is okay, right?

@ghost
Copy link

ghost commented Dec 3, 2012

Another interesting case:

obj.clone().MuchMemoryOp(inplace=True).MMO2(inplace=True).MMO3(inplace=True)

that looks useful to me if I'd like to keep obj and save memory.
In fact I wonder why I can't find the df.clone() method .
edit: there it is - df.copy()

@michaelaye
Copy link
Contributor

you could 'abuse' df.astype() to enforce getting a new array.

@ghost
Copy link

ghost commented Dec 3, 2012

jseabold suggested ending your lines with semicolons, I guess they took that from Mathematica.
That's not too obtrusive I think.
Doesn't work with qtconsole though, which I usually use.

@ghost
Copy link

ghost commented Dec 3, 2012

regarding GC, Out is much more of a problem in that regard,
I would have expected

get_ipython().history_length=0

to do the trick, but it doesn't.

@michaelaye
Copy link
Contributor

I tried to say that above, but then my memory failed me that it is the Out part that is the problem. Wes is warning about that in his book as well. So, is that another argument for returning None? Because one would not want to give up the history just to avoid caching huge objects, right?

@ghost
Copy link

ghost commented Dec 3, 2012

I think that's better addresed in IPython, though, and I find the input history far more useful
for productivity then the output history, so maybe giving just that up is acceptable if you
work with such huge objects frequently. if that's currently implemented now - I don't know.

wesm added a commit that referenced this issue Dec 8, 2012
@wesm
Copy link
Member

wesm commented Dec 8, 2012

Alright I made a game-time decision to do this-- I think it will make people more mindful about mutation.

@wesm wesm closed this as completed Dec 8, 2012
@jseabold
Copy link
Contributor Author

FYI @njsmith just posted this to numpy ML re: fluent interface. Actively discouraged by Guido.

http://mail.python.org/pipermail/python-dev/2003-October/038855.html

@wesm
Copy link
Member

wesm commented Jan 21, 2013

I agree it's poor form. I'm going to add a deprecation warning in 0.10.1

@ghost
Copy link

ghost commented Jan 21, 2013

We're agreed on back-compat being crucial.

IMO adding a warning that will trigger every time, even when the usage if fine is annyoing.
and it would have to stay in place at least till after 0.11.
I suggested a way to issue a warning only when there is actual misuse.

You opened an issue a while back concerning a warning given whenever you loaded
pandas in ipython (due to some mpl/pkg_resources deal). I sympethized.

@wesm
Copy link
Member

wesm commented Jan 21, 2013

Only trigger at the first usage

@wesm
Copy link
Member

wesm commented Jan 21, 2013

I can do the AST hack to make it less annoying

@jseabold
Copy link
Contributor Author

@y-p oh ok sure. That's up to y'all. Seems like overkill to me, but whatever. The last time we tried to creatively trigger deprecation warnings in statsmodels it ended up being much more trouble than it was worth.

@ghost
Copy link

ghost commented Jan 21, 2013

it's not terrible either way, but "Warning: that MIGHT have been a bug" seems clumsy...

@wesm
Copy link
Member

wesm commented Jan 21, 2013

I added deprecation warnings throughout. Users will forgive us: better than code breaking. We'll remove it in a few months in 0.11

yarikoptic added a commit to neurodebian/pandas that referenced this issue Jan 23, 2013
Version 0.10.1

* tag 'v0.10.1': (195 commits)
  RLS: set released to true
  RLS: Version 0.10.1
  TST: skip problematic xlrd test
  Merging in MySQL support pandas-dev#2482
  Revert "Merging in MySQL support pandas-dev#2482"
  BUG: don't let np.prod overflow int64
  RLS: note changed return type in DatetimeIndex.unique
  RLS: more what's new for 0.10.1
  RLS: some what's new for 0.10.1
  API: restore inplace=TRue returns self, add FutureWarnings. re pandas-dev#1893
  Merging in MySQL support pandas-dev#2482
  BUG: fix python 3 dtype issue
  DOC: fix what's new 0.10 doc bug re pandas-dev#2651
  BUG: fix C parser thread safety. verify gil release close pandas-dev#2608
  BUG: usecols bug with implicit first index column. close pandas-dev#2654
  BUG: plotting bug when base is nonzero pandas-dev#2571
  BUG: period resampling bug when all values fall into a single bin. close pandas-dev#2070
  BUG: fix memory error in sortlevel when many multiindex levels. close pandas-dev#2684
  STY: CRLF
  BUG: perf_HEAD reports wrong vbench name when an exception is raised
  ...
@darindillon
Copy link

Guess I'm late to this thread, but having the warning appearing ALL THE TIME (even though I already fixed it so it's not relevant) is causing false positive errors with the infrastructure that monitors our jobs and emails us when errors/warnings are thrown: #2841 . After reading this thread, I understand why you did what you did; but can we at least get some kind of global option to disable the warning?

@ghost
Copy link

ghost commented Feb 11, 2013

The python warnings module let's you do that out of the box.
http://docs.python.org/2/library/warnings.html#warnings.filterwarnings

@darindillon
Copy link

Yes, but that turns off ALL futureWarnings, even those from other libraries, which are relevant to us. (Or alternatively, you can wrap it around individual statements, but then you have to repeat that around everywhere you call the .set_index function). We don't want to disable all warnings, only the ones that are not relevant to us.
I was thinking of something like:

pandas.warnings.disable("1893") #where 1893 is the number of this particular pandas warning

@ghost
Copy link

ghost commented Feb 11, 2013

read the docs. you can filter by module and line number.

@darindillon
Copy link

Well, aren't there a bunch of different places where you can use inplace=True? (set_index, drop_duplicates, fillna, etc, etc). Each of those is a different line of code, so wouldn't we have to repeat the filter a dozen times for each of those dozen lines of code, even though they're really all conceptually the same warning?

@ghost
Copy link

ghost commented Feb 11, 2013

No you wouldn't, please see the link.

@darindillon
Copy link

I did read the docs. How would you recommend doing the following? What am I missing?

warnings.filterwarnings('ignore', module='pandas', lineno=2762) #filter out set_index warning
df.set_index('col', inplace=True) #good, no warning.
df.sort('someCol', inplace=True) #oops, now we get warning on line 3192.
warnings.filterwarnins('ignore', module='pandas', lineno=3192) #filter out sort warning
df['someCol'].fillna(0,inplace=True) #oops, now warning on line 2479
warngins.filterwarnings('ignore', module='pandas', lineno=2479) #filter out fillna warning
df.drop_duplicates(inplace=True) #oops, now warning on line 3034.
#etc, etc, for every function that takes inplace=True

I see no way to filter those errors except one by one. Do you? (Yes, you could filter out all FutureWarnings from all of pandas, but then that hides other unrelated warnings that we do need to fix. We want to filter out only the inplace=True warnings)

@ghost
Copy link

ghost commented Feb 11, 2013

If module-wide supression is not option, I have no better way without touching the code.
Assuming you can, it might be quickest to just revert the single commit that did this.

@njsmith
Copy link

njsmith commented Feb 11, 2013

The usual solution is to specify a regex that matches the pandas warning
message text. (I assume this is pretty constant even across different
triggering locations.)
On 11 Feb 2013 07:26, "tavistmorph" notifications@github.com wrote:

I did read the docs. How would you recommend doing the following? What am
I missing?

warnings.filterwarnings('ignore', module='pandas', lineno=2762) #filter
out set_index warning
df.set_index('col', inplace=True) #good, no warning.
df.sort('someCol', inplace=True) #oops, now we get warning on line 3192.
warnings.filterwarnins('ignore', module='pandas', lineno=3192) #filter out
sort warning
df['someCol'].fillna(0,inplace=True) #oops, now warning on line 2479
warngins.filterwarnings('ignore', module='pandas', lineno=2479) #filter
out fillna warning
df.drop_duplicates(inplace=True) #oops, now warning on line 3034.
#etc, etc, for every function that takes inplace=True

I see no way to filter those errors except one by one. Do you? (Yes, you
could filter out all FutureWarnings from all of pandas, but then that hides
other unrelated warnings that we do need to fix. We want to filter out only
the inplace=True warnings)


Reply to this email directly or view it on GitHubhttps://github.com//issues/1893#issuecomment-13384764..

@darindillon
Copy link

Specifying a regex on the error message is fragile too -- it'll break if the next release changes the wording of the error message. The other python convention of filtering on line number is fragile too, cause the next release will add/remove code and change the line numbers.

I'm just saying, why not give pandas a more robust way of disabling warnings, maybe something like the C++ #pragma warning macro ( http://msdn.microsoft.com/en-us/library/2c8f766e%28v=vs.80%29.aspx ) Something like:

pandas.warnings.disable("InPlaceReturnsObjectIsBeingDeprecated")
pandas.warnings.disable("SomeOtherSpecificWarning")

Or alternatively, instead of a name, maybe that takes a number specifying the warning. Gives maximum flexibility, robustness, and safety when upgrading to the next version.

I've split that off to a differnet issue (#2842) so we can end debate on this thread and resume it there.

@njsmith
Copy link

njsmith commented Feb 11, 2013

In practice, the chance of the next release changing the wording of the
message is nil (partly because it would break such scripts, also because
this warning will be removed entirely within a release or two). I see your
point and perhaps it's worth filing a second issue over, but for now you
should probably grit your teeth and filter on the message :-)
On 11 Feb 2013 08:34, "tavistmorph" notifications@github.com wrote:

Specifying a regex on the error message is fragile too -- it'll break if
the next release changes the wording of the error message. The other python
convention of filtering on line number is fragile too, cause the next
release will add/remove code and change the line numbers.

I'm just saying, why not give pandas a more robust way of disabling
warnings, maybe something like the C++ #pragma warning macro (
http://msdn.microsoft.com/en-us/library/2c8f766e%28v=vs.80%29.aspx )
Something like:

pandas.warnings.disable("InPlaceReturnsObjectIsBeingDeprecated")
pandas.warnings.disable("SomeOtherSpecificWarning")

Or alternatively, instead of a name, maybe that takes a number specifying
the warning. Gives maximum flexibility, robustness, and safety when
upgrading to the next version.

Perhaps we should split this off into a different issue?


Reply to this email directly or view it on GitHubhttps://github.com//issues/1893#issuecomment-13388577..

@ghost
Copy link

ghost commented Feb 22, 2013

@wesm, now that 0.11 is coming, can b904029 be reverted? it might cut down
the incoming complaints.

@wesm
Copy link
Member

wesm commented Mar 28, 2013

Yeah. There is a big test case where we can change all of the checks to assert that the return value is None and then go through and fix things

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants