-
Notifications
You must be signed in to change notification settings - Fork 6.5k
[DataFrame] Allows DataFrame constructor to take in another DataFrame #2072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, just a minor comment!
python/ray/dataframe/dataframe.py
Outdated
@@ -78,6 +78,10 @@ def __init__(self, data=None, index=None, columns=None, dtype=None, | |||
col_metadata (_IndexMetadata): | |||
Metadata for the new dataframe's columns | |||
""" | |||
if isinstance(data, DataFrame): | |||
self._data = data._data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the collision with pandas.DataFrame._data
, I would prefer this name to be changed. Our _data
would not have the same behavior as pandas
and if there are tools/code that rely on accessing _data
in a DataFrame
, we should not return this object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved, thanks!
python/ray/dataframe/dataframe.py
Outdated
|
||
def _set_data(self, data): | ||
self._block_partitions = data['blocks'] | ||
self._col_metadata = data['col_metadata'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the dataframe-passing semantic supposed to imply a copy? If so the metadata objects need to be copied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In pandas, this is not handled through a copy. They retain identical references to columns and index when passing a DataFrame
back through the constructor. So, I think that this should be fine.
Test PASSed. |
Merged, thanks @kunalgosar! |
* master: (22 commits) [xray] Fix bug in updating actor execution dependencies (ray-project#2064) [DataFrame] Refactor __delitem__ (ray-project#2080) [xray] Better error messaging when pulling from self. (ray-project#2068) Use source code in hash where possible (fix ray-project#2089) (ray-project#2090) Functions for flushing done tasks and evicted objects. (ray-project#2033) Fix compilation error for RAY_USE_NEW_GCS with latest clang. (ray-project#2086) [xray] Corrects Error Handling During Push and Pull. (ray-project#2059) [xray] Sophisticated task dependency management (ray-project#2035) Support calling positional arguments by keyword (fix ray-project#998) (ray-project#2081) [DataFrame] Improve performance of iteration methods (ray-project#2026) [DataFrame] Implement to_csv (ray-project#2014) [xray] Lineage cache only requests notifications about remote parent tasks (ray-project#2066) [rllib] Add magic methods for rollouts (ray-project#2024) [DataFrame] Allows DataFrame constructor to take in another DataFrame (ray-project#2072) Pin Pandas version for Travis to 0.22 (ray-project#2075) Fix python linting (ray-project#2076) [xray] Fix GCS table prefixes (ray-project#2065) Some tests for _submit API. (ray-project#2062) [rllib] Queue lib for python 2.7 (ray-project#2057) [autoscaler] Remove faulty assert that breaks during downscaling, pull configs from env (ray-project#2006) ...
Previously, DataFrame
init
failed when anotherray.dataframe.DataFrame
is passed in. This resolves that.