-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DataFrame] Implement rank #1991
Conversation
Test FAILed. |
Test PASSed. |
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good. A few comments around naming, but otherwise looks good to go.
Rebase this on current master before it can be merged.
python/ray/dataframe/dataframe.py
Outdated
na_option=na_option, | ||
ascending=ascending, pct=pct) | ||
|
||
index = self._row_metadata.index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.index
?
python/ray/dataframe/dataframe.py
Outdated
A new DataFrame | ||
""" | ||
|
||
def remote_func(df): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/remote_func/rank_helper/g
ray_df.rank() | ||
@pytest.fixture | ||
def test_rank(ray_df, pandas_df): | ||
assert(ray_df_equals_pandas(ray_df, pandas_df)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't call rank
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed... lol
@@ -43,21 +43,21 @@ def logp(self, x): | |||
|
|||
def entropy(self): | |||
a0 = self.inputs - tf.reduce_max(self.inputs, reduction_indices=[1], | |||
keep_dims=True) | |||
keepdims=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert changes to this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should rebase your changes off of master since it's hard to tell which part is for this PR specfically.
python/ray/dataframe/dataframe.py
Outdated
A new DataFrame | ||
""" | ||
|
||
def remote_func(df): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should put a more descriptive name since a lot of things are "remote functions"
python/ray/dataframe/dataframe.py
Outdated
na_option=na_option, | ||
ascending=ascending, pct=pct) | ||
|
||
index = self._row_metadata.index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.index
is more stylistically similar to the rest of the code, it should just be an alias for what you already have here
python/ray/dataframe/dataframe.py
Outdated
|
||
index = self._row_metadata.index | ||
|
||
if (axis == 1 or axis == 'columns'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use _get_axis_number
to generalize the axis names. Refer here for details.
Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few last comments.
python/ray/dataframe/dataframe.py
Outdated
na_option=na_option, | ||
ascending=ascending, pct=pct) | ||
|
||
index = self.index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: this is not needed.
python/ray/dataframe/dataframe.py
Outdated
|
||
index = self.index | ||
|
||
axis = pd.DataFrame()._get_axis_number(axis) if axis is not None \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
axis=None
throws errors. Should not handle this case here.
In [3]: df.rank(axis=None)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-7bd95fb0ea7b> in <module>()
----> 1 df.rank(axis=None)
~/dev/anaconda3/envs/ray-dev/lib/python3.5/site-packages/pandas/core/generic.py in rank(self, axis, method, numeric_only, na_option, ascending, pct)
5625 ranks : same type as caller
5626 """
-> 5627 axis = self._get_axis_number(axis)
5628
5629 if self.ndim > 2:
~/dev/anaconda3/envs/ray-dev/lib/python3.5/site-packages/pandas/core/generic.py in _get_axis_number(self, axis)
355 pass
356 raise ValueError('No axis named {0} for object type {1}'
--> 357 .format(axis, type(self)))
358
359 def _get_axis_name(self, axis):
ValueError: No axis named None for object type <class 'pandas.core.frame.DataFrame'>```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
ray_df.rank() | ||
@pytest.fixture | ||
def test_rank(ray_df, pandas_df): | ||
assert(ray_df_equals_pandas(ray_df.rank(), pandas_df.rank())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add tests for axis=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Test PASSed. |
This is currently failing tests on python 2 due to pandas-dev/pandas#20962 |
Test PASSed. |
Test PASSed. |
If intending to merge, perhaps warn or error on python2 as behavior may not match expectations. |
@kunalgosar Throwing a warning every time a user calls |
Merged, thanks @11rohans! |
* master: (21 commits) Expand local_dir in Trial init (ray-project#2013) Fixing ascii error for Python2 (ray-project#2009) [DataFrame] Implements df.update (ray-project#1997) [DataFrame] Implements df.as_matrix (ray-project#2001) [DataFrame] Implement quantile (ray-project#1992) [DataFrame] Impement sort_values and sort_index (ray-project#1977) [DataFrame] Implement rank (ray-project#1991) [DataFrame] Implemented prod, product, added test suite (ray-project#1994) [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941) [DataFrame] Implement diff (ray-project#1996) [DataFrame] Implemented nunique, skew (ray-project#1995) [DataFrame] Implements filter and dropna (ray-project#1959) [DataFrame] Implements df.pipe (ray-project#1999) [DataFrame] Apply() for Lists and Dicts (ray-project#1973) Clean up syntax for supported Python versions. (ray-project#1963) [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956) [DataFrame] Fix dtypes (ray-project#1930) keep_dims -> keepdims (ray-project#1980) add pthread linking (ray-project#1986) [DataFrame] Add layer of abstraction to allow OID instantiation (ray-project#1984) ...
* master: (25 commits) [DataFrame] Add direct pandas imports for MVP (ray-project#1960) Make ActorHandles pickleable, also make proper ActorHandle and ActorC… (ray-project#2007) Expand local_dir in Trial init (ray-project#2013) Fixing ascii error for Python2 (ray-project#2009) [DataFrame] Implements df.update (ray-project#1997) [DataFrame] Implements df.as_matrix (ray-project#2001) [DataFrame] Implement quantile (ray-project#1992) [DataFrame] Impement sort_values and sort_index (ray-project#1977) [DataFrame] Implement rank (ray-project#1991) [DataFrame] Implemented prod, product, added test suite (ray-project#1994) [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941) [DataFrame] Implement diff (ray-project#1996) [DataFrame] Implemented nunique, skew (ray-project#1995) [DataFrame] Implements filter and dropna (ray-project#1959) [DataFrame] Implements df.pipe (ray-project#1999) [DataFrame] Apply() for Lists and Dicts (ray-project#1973) Clean up syntax for supported Python versions. (ray-project#1963) [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956) [DataFrame] Fix dtypes (ray-project#1930) keep_dims -> keepdims (ray-project#1980) ...
* master: [DataFrame] Add direct pandas imports for MVP (ray-project#1960) Make ActorHandles pickleable, also make proper ActorHandle and ActorC… (ray-project#2007) Expand local_dir in Trial init (ray-project#2013) Fixing ascii error for Python2 (ray-project#2009) [DataFrame] Implements df.update (ray-project#1997) [DataFrame] Implements df.as_matrix (ray-project#2001) [DataFrame] Implement quantile (ray-project#1992) [DataFrame] Impement sort_values and sort_index (ray-project#1977) [DataFrame] Implement rank (ray-project#1991) [DataFrame] Implemented prod, product, added test suite (ray-project#1994) [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941) [DataFrame] Implement diff (ray-project#1996) [DataFrame] Implemented nunique, skew (ray-project#1995) [DataFrame] Implements filter and dropna (ray-project#1959) [DataFrame] Implements df.pipe (ray-project#1999) [DataFrame] Apply() for Lists and Dicts (ray-project#1973)
Implements
DataFrame.rank