-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add transformation and renaming to select and select! #2080
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for starting this! I've made a few high-level comments.
If you have the time, it would make sense IMO to also implement support for several input columns in this PR. The structure of the code will probably be affected so better get it right from the start.
I can add it (I have tried to design the code to make such additions easy - Let us just agree on one thing. In the case of multiple columns passed do we want to pass
(this should be efficient) or we want to pass
(this avoids recompilation each time we make a selection) The API except for the type is essentially the same with the difference that we pass a different typee and the latter potentially allows the function that takes What do you think? |
Thank you for the comments. For now we have the following key open issues:
I leave (also probably you have an opinion given the design in |
I think we should pass a Regarding the generation of column names, I guess it could make sense to list the names of all input variables up to a threshold (e.g. 3)? One tricky case is when the input columns are selected via I don't have a strong opinion about |
OK - I have made all the changes:
|
@nalimilan - I think I have addressed all the design issues in this PR. The only question is if you are OK with multiple-column automatic naming scheme I have proposed in https://github.com/JuliaData/DataFrames.jl/pull/2080/files#diff-ac2eb247bb3d79f652033279061a1ceaR40. When we confirm we are OK with the design I will add tests of the new functionality (I keep things pending in TODO at the top of the file). Essentially for the two other PRs the things that will be left to do are |
Co-Authored-By: Milan Bouchet-Valat <nalimilan@club.fr>
Yeah - this PR is not easy (and this is what I have expected). Please let me know when you will be clear with the design decisions I proposed and I will then start writing tests. |
Just to sum up the current pending design decisions:
When we settle this I will update the docs and write tests for the whole PR. |
Change agreed with @piever is pushed now. @nalimilan - so we are left with confirming that you agree to the automatic column naming rules I proposed in this PR (they then should be ported to |
Likewise, I'd throw an error for now. :-) |
OK - so I will restrict the list of what is allowed to be returned to match |
…rix as return values of fun.
I have finished working on the implementation, documentation and tests of the new functionality. So comments would be appreciated. Apart from implementing the core functionality I have caught some small things that needed to be polished in other sections of the source codes (chiefly indexing of |
One last (hopefully) design decision given the consistency with If |
For now I think it's OK to throw an error, the internal design can be changed later as long as it doesn't affect user-visible behavior. |
OK - so in this case this PR is ready for a review 😄. |
TODO: make |
I have changed to "optimistic" mode in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. Looks quite good, just a few more details.
I just checked this out and played around. Doing
throws
Would there be a way to check if a function is a unary operator and, if so, wrap it in parentheses? Or would that require metaprogramming. |
We cannot do anything about it, even |
Co-Authored-By: Milan Bouchet-Valat <nalimilan@club.fr>
Thank you for the review (this seems to be the longest PR in terms of comments and commits we have had recently). I tried to incorporate everything from the review. In particular:
|
Thank you for working on it. It was a long discussion. Let us hope people will find the new functionality useful! |
This is a first step in implementation of the discussion in #1975.
The functionality is limited, but I first wanted to make sure that on this level we are in agreement. If this is so I will add tests to this PR. After this is merged we can talk about adding more functionalities in separate PRs (even this new functionality is complex enough that writing proper tests for it will be a challenge, so I want to move forward slowly).
Also even with what I propose here a lot can be achieved, actually what is missing is only:
Col
wrapper for whole-column functions (we did not have an agreement about the name)automatic generation of column names if they are missing (here we first have to set a common rules for all functions in DataFrames.jl)(actually I have done it for the case of one column, but just forgotten that it is there 😄, we still have to decide on column name generation for multiple columns)Not
)