Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataFrame] Implement rank #1991

Merged
merged 8 commits into from
May 6, 2018
Merged

Conversation

11rohans
Copy link
Contributor

@11rohans 11rohans commented May 3, 2018

Implements DataFrame.rank

@11rohans 11rohans changed the title Ray rank [DataFrame] Implement rank May 3, 2018
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5157/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5162/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5177/
Test PASSed.

Copy link
Contributor

@kunalgosar kunalgosar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good. A few comments around naming, but otherwise looks good to go.

Rebase this on current master before it can be merged.

na_option=na_option,
ascending=ascending, pct=pct)

index = self._row_metadata.index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.index?

A new DataFrame
"""

def remote_func(df):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/remote_func/rank_helper/g

ray_df.rank()
@pytest.fixture
def test_rank(ray_df, pandas_df):
assert(ray_df_equals_pandas(ray_df, pandas_df))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't call rank

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed... lol

@@ -43,21 +43,21 @@ def logp(self, x):

def entropy(self):
a0 = self.inputs - tf.reduce_max(self.inputs, reduction_indices=[1],
keep_dims=True)
keepdims=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert changes to this file

Copy link
Contributor

@p-yang p-yang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should rebase your changes off of master since it's hard to tell which part is for this PR specfically.

A new DataFrame
"""

def remote_func(df):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should put a more descriptive name since a lot of things are "remote functions"

na_option=na_option,
ascending=ascending, pct=pct)

index = self._row_metadata.index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.index is more stylistically similar to the rest of the code, it should just be an alias for what you already have here


index = self._row_metadata.index

if (axis == 1 or axis == 'columns'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use _get_axis_number to generalize the axis names. Refer here for details.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5193/
Test FAILed.

Copy link
Contributor

@kunalgosar kunalgosar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few last comments.

na_option=na_option,
ascending=ascending, pct=pct)

index = self.index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: this is not needed.


index = self.index

axis = pd.DataFrame()._get_axis_number(axis) if axis is not None \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

axis=None throws errors. Should not handle this case here.


In [3]: df.rank(axis=None)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-7bd95fb0ea7b> in <module>()
----> 1 df.rank(axis=None)

~/dev/anaconda3/envs/ray-dev/lib/python3.5/site-packages/pandas/core/generic.py in rank(self, axis, method, numeric_only, na_option, ascending, pct)
   5625         ranks : same type as caller
   5626         """
-> 5627         axis = self._get_axis_number(axis)
   5628 
   5629         if self.ndim > 2:

~/dev/anaconda3/envs/ray-dev/lib/python3.5/site-packages/pandas/core/generic.py in _get_axis_number(self, axis)
    355                 pass
    356         raise ValueError('No axis named {0} for object type {1}'
--> 357                          .format(axis, type(self)))
    358 
    359     def _get_axis_name(self, axis):

ValueError: No axis named None for object type <class 'pandas.core.frame.DataFrame'>```

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ray_df.rank()
@pytest.fixture
def test_rank(ray_df, pandas_df):
assert(ray_df_equals_pandas(ray_df.rank(), pandas_df.rank()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add tests for axis=1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5195/
Test PASSed.

Copy link
Contributor

@kunalgosar kunalgosar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5225/
Test PASSed.

@kunalgosar
Copy link
Contributor

This is currently failing tests on python 2 due to pandas-dev/pandas#20962

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5234/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5236/
Test PASSed.

@kunalgosar
Copy link
Contributor

If intending to merge, perhaps warn or error on python2 as behavior may not match expectations.

@devin-petersohn
Copy link
Member

@kunalgosar Throwing a warning every time a user calls rank on Python2 will probably cause unnecessary concern. We do what Pandas documentation says it does, and since it is a bug in Pandas, we probably shouldn't worry too much about it. If people have a question they can open an issue and we'll just link to the Pandas issue.

@devin-petersohn devin-petersohn merged commit 9f28529 into ray-project:master May 6, 2018
@devin-petersohn
Copy link
Member

Merged, thanks @11rohans!

alok added a commit to alok/ray that referenced this pull request May 8, 2018
* master: (21 commits)
  Expand local_dir in Trial init (ray-project#2013)
  Fixing ascii error for Python2 (ray-project#2009)
  [DataFrame] Implements df.update (ray-project#1997)
  [DataFrame] Implements df.as_matrix (ray-project#2001)
  [DataFrame] Implement quantile (ray-project#1992)
  [DataFrame] Impement sort_values and sort_index (ray-project#1977)
  [DataFrame] Implement rank (ray-project#1991)
  [DataFrame] Implemented prod, product, added test suite (ray-project#1994)
  [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941)
  [DataFrame] Implement diff (ray-project#1996)
  [DataFrame] Implemented nunique, skew (ray-project#1995)
  [DataFrame] Implements filter and dropna (ray-project#1959)
  [DataFrame] Implements df.pipe (ray-project#1999)
  [DataFrame] Apply() for Lists and Dicts (ray-project#1973)
  Clean up syntax for supported Python versions. (ray-project#1963)
  [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956)
  [DataFrame] Fix dtypes (ray-project#1930)
  keep_dims -> keepdims (ray-project#1980)
  add pthread linking (ray-project#1986)
  [DataFrame] Add layer of abstraction to allow OID instantiation (ray-project#1984)
  ...
alok added a commit to alok/ray that referenced this pull request May 9, 2018
* master: (25 commits)
  [DataFrame] Add direct pandas imports for MVP (ray-project#1960)
  Make ActorHandles pickleable, also make proper ActorHandle and ActorC… (ray-project#2007)
  Expand local_dir in Trial init (ray-project#2013)
  Fixing ascii error for Python2 (ray-project#2009)
  [DataFrame] Implements df.update (ray-project#1997)
  [DataFrame] Implements df.as_matrix (ray-project#2001)
  [DataFrame] Implement quantile (ray-project#1992)
  [DataFrame] Impement sort_values and sort_index (ray-project#1977)
  [DataFrame] Implement rank (ray-project#1991)
  [DataFrame] Implemented prod, product, added test suite (ray-project#1994)
  [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941)
  [DataFrame] Implement diff (ray-project#1996)
  [DataFrame] Implemented nunique, skew (ray-project#1995)
  [DataFrame] Implements filter and dropna (ray-project#1959)
  [DataFrame] Implements df.pipe (ray-project#1999)
  [DataFrame] Apply() for Lists and Dicts (ray-project#1973)
  Clean up syntax for supported Python versions. (ray-project#1963)
  [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956)
  [DataFrame] Fix dtypes (ray-project#1930)
  keep_dims -> keepdims (ray-project#1980)
  ...
alok added a commit to alok/ray that referenced this pull request May 9, 2018
* master:
  [DataFrame] Add direct pandas imports for MVP (ray-project#1960)
  Make ActorHandles pickleable, also make proper ActorHandle and ActorC… (ray-project#2007)
  Expand local_dir in Trial init (ray-project#2013)
  Fixing ascii error for Python2 (ray-project#2009)
  [DataFrame] Implements df.update (ray-project#1997)
  [DataFrame] Implements df.as_matrix (ray-project#2001)
  [DataFrame] Implement quantile (ray-project#1992)
  [DataFrame] Impement sort_values and sort_index (ray-project#1977)
  [DataFrame] Implement rank (ray-project#1991)
  [DataFrame] Implemented prod, product, added test suite (ray-project#1994)
  [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941)
  [DataFrame] Implement diff (ray-project#1996)
  [DataFrame] Implemented nunique, skew (ray-project#1995)
  [DataFrame] Implements filter and dropna (ray-project#1959)
  [DataFrame] Implements df.pipe (ray-project#1999)
  [DataFrame] Apply() for Lists and Dicts (ray-project#1973)
fishbone added a commit that referenced this pull request Nov 3, 2021
#19996)

This reverts commit f1eedb1.

## Why are these changes needed?
Self node should avoid reading any updates from gcs for node resource change since it'll maintain local view by itself.

## Related issue number
#19438
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants