Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite select! to improve performance #1985

Merged
merged 6 commits into from
Oct 15, 2019
Merged

Rewrite select! to improve performance #1985

merged 6 commits into from
Oct 15, 2019

Conversation

Ellipse0934
Copy link
Contributor

As discussed in Issue #1861 . This will significantly improve performance for the select! function(over 1000x for 100,000 columns).
Passes the tests locally and I couldn't find any bugs upon testing.

Copy link
Member

@bkamins bkamins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Now when I look at it it can be even made shorter by using copy! and using the pattern discussed by @nalimilan in #1974.
So it would look like:

    copy!(_columns(df), _columns(df)[inds])
    x = index(df)
    copy!(_names(x), _names(df)[inds])
    for (i, n) in enumerate(x.names)
        x.lookup[n] = i
    end

@Ellipse0934 Ellipse0934 changed the title Rewrite select! to improve performance Rewrite select! to improve performance[WIP] Oct 13, 2019
@Ellipse0934
Copy link
Contributor Author

Need to import Future.copy! for backwards compatibility with ver 1.0. Will do it tomorrow.

@bkamins
Copy link
Member

bkamins commented Oct 13, 2019

Yes - exactly like in #1974 (so whichever gets merged first the other will need to rebase later).

@Ellipse0934
Copy link
Contributor Author

Ellipse0934 commented Oct 13, 2019

Can you tell me why something like VERSION < v"1.1" && import Future.copy! is a bad idea ?
Future won't be unnecessarily imported. Behavior should be the same. Let me test it out now.
EDIT: Seems like the tests have passed.

Copy link
Member

@bkamins bkamins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is OK.
@ararslan - just please confirm if this conditional import is OK

@bkamins
Copy link
Member

bkamins commented Oct 14, 2019

Ah - and I think we need to add Future to [deps] in Project.toml like in #1974 (again @ararslan - can you please have look and confirm?).

src/DataFrames.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Show resolved Hide resolved
@Ellipse0934
Copy link
Contributor Author

I think we are ready to merge.
Ping: @bkamins

@Ellipse0934 Ellipse0934 changed the title Rewrite select! to improve performance[WIP] Rewrite select! to improve performance Oct 15, 2019
@bkamins
Copy link
Member

bkamins commented Oct 15, 2019

I am OK to merge this. Let us wait for @nalimilan or @ararslan to confirm so that we have a second pair of eyes :), as this is a fundamental function in the DataFrames.jl ecosystem 😄.

@bkamins bkamins merged commit 5ee7ff7 into JuliaData:master Oct 15, 2019
@bkamins
Copy link
Member

bkamins commented Oct 15, 2019

@Ellipse0934 and @nalimilan - thank you for contributing to this. The change is really of order of magnitude improvement as @Ellipse0934 noted!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants