-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inplace returns self? #1893
Comments
Good idea actually. Useful to make the user more explicit about invoking side-effects. |
I go back and forth on it. At some point I'd decided it was better to have a consistent API w.r.t. return values (vs. |
Sure, but this is different than the in-place operations in python and in numpy. When I was first using pandas, I was really thrown by the default, make copies everywhere and return an object, especially when python and numpy lead me to expect in-place operation (e.g., sort, assigment to a view). I was bit by not catching this often (kind of like mpl returns things that you don't really need). Then I discovered the inplace keyword. Great, because I almost always want to do things in-place and avoid all the extra typing of assignment, though I have to use the keyword now everywhere. Just seems unnecessary to return the object since I explicitly asked for inplace and I already have the reference to it. Just noise at the interpreter when working interactively. |
Just as a follow-up, when writing notebooks I have to put semi-colons after every line where I do in-place operations so it doesn't barf the returned self. |
Wes, your argument confuses me. Do you consider the |
I think the proposal on the table is to always return |
Note that not returning |
Well, inplace is optional, so the easy solution is don't use inplace if you want to in turn do an operation on the returned object. |
Another place where Python's eager evaluation can be a weakness |
those are two orthogonal considerations. one (arguably) should not force the other. |
Fair enough, but pandas is still an outlier in this respect compared to many of the methods in numpy and python itself. Admittedly, my argument for a (default) inplace not returning self is because I want to save myself typing and improve readability of output at the interpreter for teaching, presenting, or demonstrating. I don't often have serious concerns about memory use and readability of scripts. What's the alternative? Having options where inplace can be 'true', 'false', or 'return'? |
Or you could provide some kind of chainable, inplace interface. I'm thinking a lot lately about building a DSL layer around pandas so you could do things like:
and have that be as fast and memory-efficient as possible. And then you could easily chain "in place operations" and get what you expect |
I fail to see the 'orthogonality' (maybe because 'orthogonal' is linguistically overrated, IMHO). Your claim that these design questions would be independent (i.e. 'orthogonal'), supports the use of something like |
obj.expensive_op(inplace=true).ViewOp1()
obj.ViewOp2() is not that bad IMHO.
|
Sure. But I understood you in the way that this kind of usage is what you suggest should be allowed? I'm sorry if I still misunderstand... Or maybe you are actually saying, even so it's powerful and harder to understand, it still should be allowed and possible? |
I think:
|
About GC: |
Another interesting case: obj.clone().MuchMemoryOp(inplace=True).MMO2(inplace=True).MMO3(inplace=True) that looks useful to me if I'd like to keep obj and save memory. |
you could 'abuse' |
jseabold suggested ending your lines with semicolons, I guess they took that from Mathematica. |
regarding GC,
to do the trick, but it doesn't. |
I tried to say that above, but then my memory failed me that it is the |
I think that's better addresed in IPython, though, and I find the input history far more useful |
Alright I made a game-time decision to do this-- I think it will make people more mindful about mutation. |
FYI @njsmith just posted this to numpy ML re: fluent interface. Actively discouraged by Guido. http://mail.python.org/pipermail/python-dev/2003-October/038855.html |
I agree it's poor form. I'm going to add a deprecation warning in 0.10.1 |
We're agreed on back-compat being crucial. IMO adding a warning that will trigger every time, even when the usage if fine is annyoing. You opened an issue a while back concerning a warning given whenever you loaded |
Only trigger at the first usage |
I can do the AST hack to make it less annoying |
@y-p oh ok sure. That's up to y'all. Seems like overkill to me, but whatever. The last time we tried to creatively trigger deprecation warnings in statsmodels it ended up being much more trouble than it was worth. |
it's not terrible either way, but "Warning: that MIGHT have been a bug" seems clumsy... |
I added deprecation warnings throughout. Users will forgive us: better than code breaking. We'll remove it in a few months in 0.11 |
Version 0.10.1 * tag 'v0.10.1': (195 commits) RLS: set released to true RLS: Version 0.10.1 TST: skip problematic xlrd test Merging in MySQL support pandas-dev#2482 Revert "Merging in MySQL support pandas-dev#2482" BUG: don't let np.prod overflow int64 RLS: note changed return type in DatetimeIndex.unique RLS: more what's new for 0.10.1 RLS: some what's new for 0.10.1 API: restore inplace=TRue returns self, add FutureWarnings. re pandas-dev#1893 Merging in MySQL support pandas-dev#2482 BUG: fix python 3 dtype issue DOC: fix what's new 0.10 doc bug re pandas-dev#2651 BUG: fix C parser thread safety. verify gil release close pandas-dev#2608 BUG: usecols bug with implicit first index column. close pandas-dev#2654 BUG: plotting bug when base is nonzero pandas-dev#2571 BUG: period resampling bug when all values fall into a single bin. close pandas-dev#2070 BUG: fix memory error in sortlevel when many multiindex levels. close pandas-dev#2684 STY: CRLF BUG: perf_HEAD reports wrong vbench name when an exception is raised ...
Guess I'm late to this thread, but having the warning appearing ALL THE TIME (even though I already fixed it so it's not relevant) is causing false positive errors with the infrastructure that monitors our jobs and emails us when errors/warnings are thrown: #2841 . After reading this thread, I understand why you did what you did; but can we at least get some kind of global option to disable the warning? |
The python |
Yes, but that turns off ALL futureWarnings, even those from other libraries, which are relevant to us. (Or alternatively, you can wrap it around individual statements, but then you have to repeat that around everywhere you call the .set_index function). We don't want to disable all warnings, only the ones that are not relevant to us. pandas.warnings.disable("1893") #where 1893 is the number of this particular pandas warning |
read the docs. you can filter by module and line number. |
Well, aren't there a bunch of different places where you can use inplace=True? (set_index, drop_duplicates, fillna, etc, etc). Each of those is a different line of code, so wouldn't we have to repeat the filter a dozen times for each of those dozen lines of code, even though they're really all conceptually the same warning? |
No you wouldn't, please see the link. |
I did read the docs. How would you recommend doing the following? What am I missing? warnings.filterwarnings('ignore', module='pandas', lineno=2762) #filter out set_index warning I see no way to filter those errors except one by one. Do you? (Yes, you could filter out all FutureWarnings from all of pandas, but then that hides other unrelated warnings that we do need to fix. We want to filter out only the inplace=True warnings) |
If module-wide supression is not option, I have no better way without touching the code. |
The usual solution is to specify a regex that matches the pandas warning
|
Specifying a regex on the error message is fragile too -- it'll break if the next release changes the wording of the error message. The other python convention of filtering on line number is fragile too, cause the next release will add/remove code and change the line numbers. I'm just saying, why not give pandas a more robust way of disabling warnings, maybe something like the C++ #pragma warning macro ( http://msdn.microsoft.com/en-us/library/2c8f766e%28v=vs.80%29.aspx ) Something like: pandas.warnings.disable("InPlaceReturnsObjectIsBeingDeprecated") Or alternatively, instead of a name, maybe that takes a number specifying the warning. Gives maximum flexibility, robustness, and safety when upgrading to the next version. I've split that off to a differnet issue (#2842) so we can end debate on this thread and resume it there. |
In practice, the chance of the next release changing the wording of the
|
@wesm, now that 0.11 is coming, can b904029 be reverted? it might cut down |
Yeah. There is a big test case where we can change all of the checks to assert that the return value is None and then go through and fix things |
Is there any reason that whenever I'm doing inplace=True I always get self returned? Obviously not a huge deal, but this is kind of wart-y IMO. I wouldn't expect inplace to return anything.
The text was updated successfully, but these errors were encountered: