-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DataFrame] Implement where #1989
Conversation
Test PASSed. |
Test PASSed. |
Test PASSed. |
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! A few comments.
python/ray/dataframe/utils.py
Outdated
@@ -108,10 +108,11 @@ def to_pandas(df): | |||
A new pandas DataFrame. | |||
""" | |||
if df._row_partitions is not None: | |||
pd_df = pd.concat(ray.get(df._row_partitions)) | |||
print("Yes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this
python/ray/dataframe/dataframe.py
Outdated
other_zipped = (v for k, v in self._copartition(other, | ||
self.index)) | ||
|
||
new_partitions = [where_helper.remote(k, v, next(other_zipped), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can k, v be converted to lists and passed in by reference? Ray will automatically deserialize then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not without merging them together, then also passing the length of the left. Performance-wise it's not much different.
python/ray/dataframe/dataframe.py
Outdated
# from blocks and the axes are set according to the blocks. We have | ||
# already correctly copartitioned everything, so there's no | ||
# correctness problems with doing this. | ||
left.reset_index(inplace=True, drop=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since everything is concatenated into row partitions, can you only reset the column index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to reset the index here because that's what other
is relying on.
Test FAILed. |
Test PASSed. |
python/ray/dataframe/dataframe.py
Outdated
args = (False, axis, level, errors, try_cast, raise_on_error) | ||
|
||
@ray.remote | ||
def where_helper(left, cond, other, left_columns, cond_columns, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's dangerous defining a remote function inside of a method call, because this will define a new remote function every time the method is called. Currently this is a bit heavyweight. We probably want to move this outside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, thanks. I had this in during development and forgot to move it. Sorry about that!
Test PASSed. |
* master: [DataFrame] Implement where (ray-project#1989) [DataFrame] Add direct pandas imports for MVP (ray-project#1960) Make ActorHandles pickleable, also make proper ActorHandle and ActorC… (ray-project#2007)
* master: [xray] Fix UniqueID hashing for object and task IDs. (ray-project#2017) [DataFrame] Fixing bugs in groupby (ray-project#2031) [DataFrame] Fixes dropna subset bug (ray-project#2018) [DataFrame] Implement where (ray-project#1989)
Implement
DataFrame.where
.Still needs:
where
whenSeries
objects are passed in asother
.