-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GroupKey
should be comparable between DataFrames
#2639
Comments
By the way, it might be worth doing some quick performance tests before seriously considering changing this. While I think the inability to compare |
Performance is not a problem. We can just change current:
to
the only thing is that it will be slow when mixing data frames (but will be as fast as currently for the same data frame). Then we would need to define @nalimilan - do you think we should consider these potential changes as breaking? |
As a general comment - it should be easy to change once we decide what we want. |
Turning the error into something that works is definitely not breaking. Changing |
So the logic is:
|
@nalimilan - we should then additionally decide how == and isequal should work when mixing GroupKey and other types. I think we should not compare as true to vectors or Tuple, but we can consider comparing as isequal (and consequently ==) to NamedTuple. If we decided to go this way then hash of GroupKey must match hash of NamedTuple (which will be a bit expensive, but acceptable). What do you think? |
Then also the question is if we should add |
Is there a way to get Namedtuple type of keys directly without the conversion of |
It is not clear what you mean. What would be the difference between |
Just a shortcut function. |
I think it is not needed often enough, and the alternative is simple enough, so adding it is not needed. @nalimilan - what do you think? |
I agree it doesn't seem worth it. We try to limit the API surface to keeps things manageable for users. |
The
GroupKey
grouped dataframe keys are distinct from aNamedTuple
or other "row-like" objects for the sake of efficiency, but currently they are not comparable across dataframes. This is not consistent with other "row-like" objects, all of which are comparable regardless of their parent.This forbids some reasonable patterns such as
As well as the direct creation of any other type of dictionary using
GroupKey
as a key.Now, conversion of the
GroupKey
is quite simple withNamedTuple
, but the question is whether this conversion should be left up to the user.Note that, if we did change the behavior, some sort of safe but efficient reference check would be need on
getindex(::GroupedDataFrame, ::GroupKey)
to ensure that a compatible key is being used and if not, to perform the proper conversion.The text was updated successfully, but these errors were encountered: