-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix depwarns of getindex #1585
Fix depwarns of getindex #1585
Conversation
@nalimilan Before I move forward I would like to discuss one design issue. In the transition period I need to write specialized code for |
I'd rather define duplicated methods in deprecated.jl, so that the main code remains clean and they are easy to remove. That should also ensure |
Good point - thanks. The question is if we want to keep it @nalimilan - do you have any opinion on this? |
Tricky question! If we make copies automatically then there will be no (simple) way of concatenating a OTOH I'm not sure it's a very common need, and it makes things a bit more complex (creating Overall I'd lean towards preserving views, but it's a tough call. |
I am slowly moving forward. I make
selection. The tricky part is if we want to have |
I'm not sure it really matters whether we make a copy or not in |
@test all(ismissing, df[2]) | ||
@test all(ismissing, df[3]) | ||
# TODO: enable those tests after getindex deprecation period | ||
# @test typeof(df[:, 1]) == Vector{Union{Int, Missing}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use @test_broken
for these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not broken - the test passes, but prints warnings (massively) and I understood that you wanted to avoid printing unnecessary warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's OK if the tests print warnings, as long as they are correct in the long term. Better have good tests that are temporarily verbose than having to enable them manually later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I understand we go the following route:
- it is not a problem if tests print warnings;
- however, we want to avoid printing warnings by library functions in normal usage.
Yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
The amount of code that needs duplicating is terrifying. I wonder whether it wouldn't be simpler even for users to change the behavior without a deprecation. Indeed in most situations the deprecation will force people to write |
I propose that I finish this PR (I will remove WIP then). However, I would retain deprecation - even for a short period to simplify debugging (even something what Julia 0.7 and 1.0 had - actually it worked well from my experience). The big question is if we wait for |
@nalimilan This should be good for a reivew now. |
data = [df[name] for df in dfs] | ||
# TODO: replace with commented out code after getindex deprecation | ||
# data = [df[name] for df in dfs] | ||
data = [df isa DataFrame ? df[name] : df[:, name] for df in dfs] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to always use view
here (even if that's temporary)? Same for completecases
and maybe other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do this, but this would be breaking.
That is: currently df[name]
when df
is a SubDataFrame
produces a copy and so I used df[:, name]
to still make a copy temporarily (it will be a view after getindex
deprecation).
Do you think making a view
now would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But here it's an implementation detail which shouldn't have any user-visible consequences, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - right. I mixed it up with hcat!
(this PR is really long). I will push a PR changing copy to view
in places where it is not user visible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have pushed a PR implementing what you have requested. Actually it is a good way to test if something will not break in the future. The only limitation we have is with vcat
with which you have started (I have noted it in the code).
OK, thanks, let's do this then, but remove the deprecations more quickly than we usually do. |
@nalimilan This should be cleaned up now for the next round of checking. The only method that is left problematic is |
# the code below assumes that only DataFrame and SubDataFrame | ||
# are subtypes of AbstractDataFrame | ||
# it should be removed ASAP after deprecation | ||
data = [df isa DataFrame ? df[name] : view(parent(df)[name], rows(df)) for df in dfs] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't matter much since it's temporary, but better call view
only for SubDataFrame
, so that the fallback works for any type. Then you don't need the scary comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I cannot do this here because SubDataFrame
is not defined at this stage of DataFrames.jl loading. And I would have to move the whole function much later in the loading process to make it work, which I think should not be done since it is only a temporary change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh. Well, let's say we will remove this soon anyway...
Good to merge (I do not want to do it myself, as it is too delicate 😄)? |
So scary... |
I am in the process of cleaning up after
getindex
revision. I will continue as this is not finished (but it givesthe idea of changes).
A thing to discuss from what I have already changed is for
hcat!
- if we want a target functionality to putviews
or copies if RHS ofhcat!
is aSubDataFrame
.