Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return type of broadcast on GroupedDataFrame #1680

Open
nalimilan opened this issue Jan 16, 2019 · 4 comments
Open

Return type of broadcast on GroupedDataFrame #1680

nalimilan opened this issue Jan 16, 2019 · 4 comments
Labels
grouping non-breaking The proposed change is not breaking
Milestone

Comments

@nalimilan
Copy link
Member

Currently broadcasting on a GroupedDataFrame returns a Vector. This is inconsistent with map, which returns a GroupedDataFrame. Should we change this? I'd say yes:

  • a Vector doesn't carry any information about the groups, making the result almost useless
  • one can always use a comprehension to get a vector

If we agree to change this, we need to decide what to return exactly:

  • a GroupedDataFrame: consistent with map, which makes sense since broadcast and map return the same kind of objects in general in Base
  • a DataFrame: like combine, which may be more convenient since most operations are not supported on GroupedDataFrame

Maybe the solution is to return a GroupedDataFrame, but make that type behave more like a DataFrame (#1256). One issue is that a GroupedDataFrame doesn't make a lot of sense when each group contains a single row; so it depends on whether the most common use case for broadcast is to apply a function which returns multiple rows (like describe at #1539), or a single row.

@bkamins
Copy link
Member

bkamins commented Jan 17, 2019

👍 for making broadcast and map consistent.

As of for what to do with GroupedDataFrame I am OK with whatever you would propose that is consistent and efficient as I guess you are understand the whole split-apply-combine infrastructure best (in the worst case the user can use combine after broadcast/map).

@pdeffebach
Copy link
Contributor

I think a grouped data frame is nice. The big drawback is DataFrame to scalar or vector operations. Thankfully DataFrames can hold whatever kind of stuff they want.

@bkamins
Copy link
Member

bkamins commented Jan 22, 2019

@pdeffebach

The big drawback is DataFrame to scalar or vector operations.

Can you expand your thought please so that I am sure what you mean exactly. Thank you.

@nalimilan nalimilan added this to the 1.0 milestone Jun 9, 2019
@bkamins
Copy link
Member

bkamins commented Sep 3, 2019

We now disallow broadcasting - and I would keep it that way. We can keep this issue open, but I would remove 1.0 milestone from it.

@nalimilan nalimilan removed this from the 1.0 milestone Sep 3, 2019
@bkamins bkamins added the non-breaking The proposed change is not breaking label Feb 12, 2020
@bkamins bkamins added this to the 2.0 milestone Feb 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
grouping non-breaking The proposed change is not breaking
Projects
None yet
Development

No branches or pull requests

3 participants