Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataFrame] Implementation for head, idxmax, idxmin, pop, tail, and Ray Index #1520

Merged
merged 5 commits into from
Feb 7, 2018

Conversation

devin-petersohn
Copy link
Member

@devin-petersohn devin-petersohn commented Feb 7, 2018

Continuing to add functionality.

Ray Index will be needed to implement methods such as __getitem__() and will be extremely useful going forward. We cannot rely on the Pandas Index globally because we have a different structure overall. We can, however use the Pandas Index locally to access the item we need.

Index structure: key -> (partition, key). The second key is the Pandas Index key.

Edit: Additional work is needed to integrate the new Index structure, but we at least have the skeleton in this PR.

Copy link
Collaborator

@robertnishihara robertnishihara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. Looks good to me.


def hist(self, data, column=None, by=None, grid=True, xlabelsize=None,
xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False,
sharey=False, figsize=None, layout=None, bins=10, **kwds):
raise NotImplementedError("Not Yet implemented.")

def idxmax(self, axis=0, skipna=True):
raise NotImplementedError("Not Yet implemented.")
"""Get the index of the first occurrence of the maximum value of the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to keep the first line of the docstrings in a single line (e.g., replae maximum -> max).

Same comment for the next function.


Returns:
A Series with the index for each maximum value for the axis
specified.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but can we indent "specified" by four spaces? Same comment for a couple other functions in this PR.

i = 0
while n > 0 and i < len(self._df):
if (n - sizes[i]) < 0:
new_dfs.append(_deploy_func.remote(lambda df: df.head(n),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we may want to consider defining the remote functions up front because deploying them every time could be slow (or maybe just deploying them once and then caching the result), but this is an optimization for later.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/3626/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/3628/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/3635/
Test PASSed.

@robertnishihara robertnishihara merged commit fa37564 into ray-project:master Feb 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants