-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion regarding custom row-like outputs in combine
and transform
#2576
Comments
Okay I've been digging into this more. It's possible to implement the behavior I want exactly with Getting automatic "spreading" to work for this new type in I think that the only type signatures I ever expanded were those that also allowed The advantages are making it easier for user-defined types to work. I guess this proposal doesn't really work though since No clear solutions as of yet. |
I am marking it both "breaking" and "non-braking" as I am not sure yet if what you propose is breaking or not 😄. What we currently have is that we have a special treatment of with
What would be exactly your proposal? To add "Tables.AbstractRow |
Indeed, I do not have a proposal as of yet. Maybe this is more of a gripe about how it's tougher than I thought to extend the interface and how I think the implementations between Consider a type,
The above works only because If I don't implement
That is, spreading is not automatic and it's not clear how Moving on to
When I define
It works, since there is a
same with
Finally, with a
With the
This is inconsistent with the output from the I guess the actionable item is to make the post-processing of the
That would break the second-to-last example above
|
Thank you for your comments. Here are my answers:
Yes - they are inconsistent unfortunately for legacy reasons.
It cannot - as it is not a table. It is a row of a table. You would have to write e.g.:
Yes, because it makes
Yes
Yes, because this is a different API (unfortunately). But I have just double checked and it is documented like this.
We will not do it because it would be massively breaking. E.g.:
and the legacy behavior (the last one) follows the second one not the first one as you propose. As I have said - this has been in DataFrames.jl for many years and there is a risk that legacy code relies on this behavior and such changes are very tricky to find in large code bases.
Notion of
This could be added - it would be breaking, but only mildly, so probably it is acceptable to treat any
Changing this would be very breaking. |
Thanks for the detailed reply. It is unfortunate that we can't break that inconsistency in The other thing that bugs me is that the Can you confirm this is intended? And how you conceptualized this post-processing promotion when you wrote the code? I read through it but it wasn't obvious. Also note that
fails, neither does |
This is a good point - it is not easy to spread unless you convert to
This is the legacy of the "early days" of DataFrames.jl unfortunately.
could you please explain what you mean here? as I am not fully clear here.
Also here I am not fully clear what you are asking about. In general As I have said I think we can make |
That would be a natural appraoch, however something can follow the interface of
That's what I mean in this comment. Currently, the fact that So there is a tension between, if a type is not |
Then we would need Would adding such |
DataFrameRow <: Tables.AbstractRow
and widening some type signaturescombine
and transform
It would have to be So maybe there should be a better way to navigate exactly how an object fits in the Tables.jl interface |
So let us keep this issue open for now, and could you open a discussion in Tables.jl to have @quinnj comment on how it would fit there? Thank you! |
@pdeffebach , back in your post here, could you expound on how exactly |
Thanks. Indeed I implemented all the methods. I even re-implemented the ones that are default by |
Ok cool; I don't really understand the nuances of the |
In short:
Note that we do not want all types to be treated this way, as most often when doing transformations the user wants to keep the result in a single column (so e.g. vector of structs is not expanded to multiple columns in transformations although it is Tables.jl table). We want types to explicitly opt-in to be treated as rows of a table and then get special treatment. |
On top of that, there are times when we are given an object and we need to decide if it is a "row" or it is a "table". So it would be nice to have a way of handling those distinctions via |
Ok, that makes more sense. Yeah, there aren't really great ways of defining |
I wrote a package that solves this problem with meta-programming. It's called AddToField.
|
Nice. Why do you want it separate from DataFramesMeta.jl? |
It's a general tool that has no external dependencies apart from MacroTools. It makes it really easy to create NamedTuples in general so making it in DataFramesMeta would be too restrictive. |
Does this discussion have anything to do with why the code written by @bkamins here: https://bkamins.github.io/julialang/2020/12/24/minilanguage.html
Doesn't work? Edit: It seems that I needed to update my version of Julia, my apologies! |
|
Sorry to close and re-open. I was going to make a comment but then decided not to. Accidentally closed it after. However making |
The question is how pressing this is in practice. While I agree it would improve consistency but the problem is that |
Not pressing. Unless we see reports of users making hundreds or thousands of new columns from a |
I'm currently working on a small implementation of a
Tables.AbstractRow
. It's basically a wrapper around anOrderedDict
which is aTables.AbstractRow
and defines all the convenience methods of aDataFrameRow
, like indexing withNot
.The goal is to make it easier to work with complicated
combine
calls where to iteratively build up your output. This is actually quite difficult and I think there are a few inconsistencies in the APItransform
works fine withByRow
.ByRow
returns a vector ofTables.AbstractRow
objects, which is itself a Tablecombine
using thesrc => fun => AsTable
output, I can collapse to get single rows when I defineTables.columntable
for myDataRow
type. However this trick doesn't work when using an anonymous function. With the anonymous function, the output really has to be a matrix, named tuple, or data frame.This might be an inconsistency in the API.
Regardless, I will move forward with the development of my package and call
NamedTuple
at the very end of mycombine
call to get the desired behavior.The text was updated successfully, but these errors were encountered: