-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DataFrame] Implements mode, to_datetime, and get_dummies #1956
Conversation
Test PASSed. |
Test PASSed. |
Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments, overall looks good.
python/ray/dataframe/dataframe.py
Outdated
def mode_helper(df): | ||
if numeric_only: | ||
df = df.select_dtypes(exclude='object') | ||
mode_df = df.mode(axis=axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add numeric_only
to remove the check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved.
python/ray/dataframe/datetime.py
Outdated
infer_datetime_format=infer_datetime_format, | ||
origin=origin) | ||
if errors == 'raise': | ||
pandas.to_datetime(pandas.DataFrame(columns=arg.columns)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the error checking, do you need all of the other parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved.
@@ -0,0 +1,106 @@ | |||
import ray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved.
Test PASSed. |
python/ray/dataframe/dataframe.py
Outdated
df = df.reindex(pd.RangeIndex(max_len), axis=axis) | ||
return df | ||
|
||
if axis == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably reuse the code across both if cases, since they're largely the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved.
python/ray/dataframe/dataframe.py
Outdated
mode_df = df.mode(axis=axis) | ||
return mode_df, mode_df.shape[axis] | ||
|
||
def reindex_helper(df, *lengths): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename this as it's too similar to utils._reindex_helper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved.
Thanks @Veryku @devin-petersohn for reviewing! I addressed the comments above. |
Test PASSed. |
python/ray/dataframe/reshape.py
Outdated
DataFrame or one-hot encoded data. | ||
""" | ||
if not isinstance(data, DataFrame): | ||
pandas.get_dummies(data, prefix=prefix, prefix_sep=prefix_sep, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a return statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh good catch, thanks!
python/ray/dataframe/reshape.py
Outdated
with_dummies = data.drop(columns_to_encode, axis=1)._col_partitions | ||
dropped_columns = data.columns.drop(columns_to_encode) | ||
|
||
def remote_func(df, to_drop, prefix, prefix_sep): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is pretty undescriptive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
Test PASSed. |
Test PASSed. |
Test PASSed. |
Test PASSed. |
Jenkins, retest this please. |
Test FAILed. |
Jenkins, retest this please. |
Test PASSed. |
Merged, thanks @kunalgosar |
* magic-methods: fmt Fix IndentationError Write magic methods for SampleBatch/PartialRollout Clean up syntax for supported Python versions. (ray-project#1963) [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956) [DataFrame] Fix dtypes (ray-project#1930) keep_dims -> keepdims (ray-project#1980) add pthread linking (ray-project#1986) [DataFrame] Add layer of abstraction to allow OID instantiation (ray-project#1984) [DataFrame] Fix blocking issue on _IndexMetadata passing (ray-project#1965)
* master: (21 commits) Expand local_dir in Trial init (ray-project#2013) Fixing ascii error for Python2 (ray-project#2009) [DataFrame] Implements df.update (ray-project#1997) [DataFrame] Implements df.as_matrix (ray-project#2001) [DataFrame] Implement quantile (ray-project#1992) [DataFrame] Impement sort_values and sort_index (ray-project#1977) [DataFrame] Implement rank (ray-project#1991) [DataFrame] Implemented prod, product, added test suite (ray-project#1994) [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941) [DataFrame] Implement diff (ray-project#1996) [DataFrame] Implemented nunique, skew (ray-project#1995) [DataFrame] Implements filter and dropna (ray-project#1959) [DataFrame] Implements df.pipe (ray-project#1999) [DataFrame] Apply() for Lists and Dicts (ray-project#1973) Clean up syntax for supported Python versions. (ray-project#1963) [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956) [DataFrame] Fix dtypes (ray-project#1930) keep_dims -> keepdims (ray-project#1980) add pthread linking (ray-project#1986) [DataFrame] Add layer of abstraction to allow OID instantiation (ray-project#1984) ...
* master: (25 commits) [DataFrame] Add direct pandas imports for MVP (ray-project#1960) Make ActorHandles pickleable, also make proper ActorHandle and ActorC… (ray-project#2007) Expand local_dir in Trial init (ray-project#2013) Fixing ascii error for Python2 (ray-project#2009) [DataFrame] Implements df.update (ray-project#1997) [DataFrame] Implements df.as_matrix (ray-project#2001) [DataFrame] Implement quantile (ray-project#1992) [DataFrame] Impement sort_values and sort_index (ray-project#1977) [DataFrame] Implement rank (ray-project#1991) [DataFrame] Implemented prod, product, added test suite (ray-project#1994) [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941) [DataFrame] Implement diff (ray-project#1996) [DataFrame] Implemented nunique, skew (ray-project#1995) [DataFrame] Implements filter and dropna (ray-project#1959) [DataFrame] Implements df.pipe (ray-project#1999) [DataFrame] Apply() for Lists and Dicts (ray-project#1973) Clean up syntax for supported Python versions. (ray-project#1963) [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956) [DataFrame] Fix dtypes (ray-project#1930) keep_dims -> keepdims (ray-project#1980) ...
Import
unique
andvalue_counts
directly from pandas.This PR implements
mode
,to_datetime
, andget_dummies
and includes a fix for__getitem__
.