-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DataFrame] Implementing API correct groupby with aggregation methods #1914
[DataFrame] Implementing API correct groupby with aggregation methods #1914
Conversation
Test FAILed. |
Test PASSed. |
Test PASSed. |
Test PASSed. |
Test PASSed. |
Their seem to be some test failures, e.g.,
|
python/ray/dataframe/groupby.py
Outdated
sort, | ||
group_keys, | ||
squeeze, | ||
*part), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like a python 2 issue, you may need to do
args=(by,
axis,
level,
as_index,
sort,
group_keys,
squeeze) + part
or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can give this a try.
Test FAILed. |
"To contribute to Pandas on Ray, please visit " | ||
"github.com/ray-project/ray.") | ||
elif is_list_like(arg): | ||
from .concat import concat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put this import with the other imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cyclical import won't allow this.
python/ray/dataframe/dataframe.py
Outdated
else: | ||
kwargs['temp_index'] = self.index | ||
|
||
def remote_helper(df, arg, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More descriptive names would be helpful here for readability.
python/ray/dataframe/dataframe.py
Outdated
|
||
# This magic unzips the list comprehension returned from remote | ||
is_series, new_parts, index, columns = \ | ||
[list(t) for t in zip(*remote_result)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need each variable in list form? zip
should allow auto-unboxing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do need most of them in list form. I'll go ahead and change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do actually need them all in lists because of the ray.get()
# DataFrame, and we have to determine which here. Shouldn't add | ||
# too much to latency in either case because the booleans can | ||
# be returned immediately | ||
is_series = ray.get(is_series) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't getting the booleans require a block on the rest of the parameters being calculated anyways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The (de)serialization should be faster, so it should be ready sooner (given a large enough Series).
python/ray/dataframe/dataframe.py
Outdated
# return DataFrames | ||
elif any(is_series): | ||
raise ValueError("no results.") | ||
elif axis == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's different between the last two cases in this if statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add better comments.
Test PASSed. |
* master: updates (ray-project#1958) Pin Cython in autoscaler development example. (ray-project#1951) Incorporate C++ Buffer management and Seal global threadpool fix from arrow (ray-project#1950) [XRay] Add consistency check for protocol between node_manager and local_scheduler_client (ray-project#1944) Remove smart_open install. (ray-project#1943) [DataFrame] Fully implement append, concat and join (ray-project#1932) [DataFrame] Fix for __getitem__ string indexing (ray-project#1939) [DataFrame] Implementing write methods (ray-project#1918) [rllib] arr[end] was excluded when end is not None (ray-project#1931) [DataFrame] Implementing API correct groupby with aggregation methods (ray-project#1914) Handle interrupts correctly for ASIO synchronous reads and writes. (ray-project#1929) [DataFrame] Adding read methods and tests (ray-project#1712) Allow task_table_update to fail when tasks are finished. (ray-project#1927) [rllib] Contribute DDPG to RLlib (ray-project#1877) [xray] Workers blocked in a `ray.get` release their resources (ray-project#1920) Raylet task dispatch and throttling worker startup (ray-project#1912) [DataFrame] Eval fix (ray-project#1903)
* 'master' of https://github.com/ray-project/ray: [rllib] Fix broken link in docs (ray-project#1967) [DataFrame] Sample implement (ray-project#1954) [DataFrame] Implement Inter-DataFrame operations (ray-project#1937) remove UniqueIDHasher (ray-project#1957) [rllib] Add DDPG documentation, rename DDPG2 <=> DDPG (ray-project#1946) updates (ray-project#1958) Pin Cython in autoscaler development example. (ray-project#1951) Incorporate C++ Buffer management and Seal global threadpool fix from arrow (ray-project#1950) [XRay] Add consistency check for protocol between node_manager and local_scheduler_client (ray-project#1944) Remove smart_open install. (ray-project#1943) [DataFrame] Fully implement append, concat and join (ray-project#1932) [DataFrame] Fix for __getitem__ string indexing (ray-project#1939) [DataFrame] Implementing write methods (ray-project#1918) [rllib] arr[end] was excluded when end is not None (ray-project#1931) [DataFrame] Implementing API correct groupby with aggregation methods (ray-project#1914)
What do these changes do?
Adds groupby and allows users to interact with the GroupBy object the same way they would in Pandas.
groupby
implementationDataFrameGroupBy
objectagg
/aggregate
/apply
for non-dictionary arguments