Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataFrame] Implementing API correct groupby with aggregation methods #1914

Merged
merged 38 commits into from
Apr 22, 2018
Merged
Changes from 1 commit
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
117aa1e
Implementing groupby object
devin-petersohn Apr 7, 2018
3ac3405
Making sum work
devin-petersohn Apr 7, 2018
84de470
Making lazy and adding axis=1 support
devin-petersohn Apr 8, 2018
24ec3c3
Starting on agg
devin-petersohn Apr 9, 2018
23cdb06
Fixing errors to report same as Pandas
devin-petersohn Apr 9, 2018
8f4b585
Minor changes
devin-petersohn Apr 10, 2018
3cffd84
Adding aggregate and apply for single strings
devin-petersohn Apr 10, 2018
00b743c
Checkpointing progress
devin-petersohn Apr 10, 2018
1e5bd79
Start toward implementing callables
devin-petersohn Apr 10, 2018
148ef09
Working on agg
devin-petersohn Apr 12, 2018
1999868
Groupby + agg functional for string or callable
devin-petersohn Apr 12, 2018
b6872cb
Begin implementation of groupby methods
kunalgosar Apr 10, 2018
8fd1f56
Implement remaining groupby methods
kunalgosar Apr 13, 2018
8f88ce0
Moving to being more lazy
devin-petersohn Apr 13, 2018
d975118
updating groupby
devin-petersohn Apr 14, 2018
b90548b
Updating remote
devin-petersohn Apr 14, 2018
078911a
Update groupby
devin-petersohn Apr 14, 2018
3fa9bd3
Fixing some performance issues
devin-petersohn Apr 14, 2018
759eeec
Adding list support for reduction tasks
devin-petersohn Apr 15, 2018
e67230f
Removing print
devin-petersohn Apr 15, 2018
4fd1f14
Working on performance debug
devin-petersohn Apr 16, 2018
2e0b367
Working on tuning
devin-petersohn Apr 16, 2018
8871f37
Making lists of functions work for agg and apply
devin-petersohn Apr 16, 2018
0c0ea4f
Cleaning up
devin-petersohn Apr 16, 2018
295f453
Improving serialization of agg
devin-petersohn Apr 16, 2018
3d163a2
implement transform
kunalgosar Apr 16, 2018
796939a
resolve merge artifacts
kunalgosar Apr 16, 2018
c6ba8c8
groupby transform works now
kunalgosar Apr 16, 2018
e592957
temp implementation of __array__
kunalgosar Apr 16, 2018
cd5a346
some error handling and kwargs cleanup
kunalgosar Apr 16, 2018
e0d0e9f
add a todo
kunalgosar Apr 16, 2018
a2d8b32
Updating groupby for utility.
devin-petersohn Apr 17, 2018
534fb94
Cleanup code
devin-petersohn Apr 17, 2018
0ff8a80
Fix lint
devin-petersohn Apr 17, 2018
b12a339
Fixing tests
devin-petersohn Apr 18, 2018
01b29d0
Fix lint
devin-petersohn Apr 18, 2018
02d7116
Fix Python 2 syntax issue.
robertnishihara Apr 20, 2018
d341cb2
Addressing comments
devin-petersohn Apr 20, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Updating remote
  • Loading branch information
devin-petersohn committed Apr 17, 2018
commit b90548bbc12e6ab1868835902721abb73c0ae902
8 changes: 7 additions & 1 deletion python/ray/dataframe/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ def __init__(self, df, by, axis, level, as_index, sort, group_keys,
self._index = df.index
self._axis = axis

self._row_metadata = df._row_metadata
self._col_metadata = df._col_metadata

if axis == 0:
partitions = df._col_partitions
index_grouped = pd.Series(self._index).groupby(by=by, sort=sort)
Expand Down Expand Up @@ -50,7 +53,10 @@ def _iter(self):
return ((self._keys_and_values[i][0],
DataFrame(col_partitions=part,
columns=self._columns,
index=self._keys_and_values[i][1].index))
index=self._keys_and_values[i][1].index,
row_metadata=self._row_metadata.loc[
self._keys_and_values[i][1].index],
col_metadata=self._col_metadata))
for i, part in enumerate(self._grouped_partitions))
else:
return ((self._keys_and_values[i][0],
Expand Down