Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataFrame] Implementing API correct groupby with aggregation methods #1914

Merged
merged 38 commits into from
Apr 22, 2018
Merged
Changes from 1 commit
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
117aa1e
Implementing groupby object
devin-petersohn Apr 7, 2018
3ac3405
Making sum work
devin-petersohn Apr 7, 2018
84de470
Making lazy and adding axis=1 support
devin-petersohn Apr 8, 2018
24ec3c3
Starting on agg
devin-petersohn Apr 9, 2018
23cdb06
Fixing errors to report same as Pandas
devin-petersohn Apr 9, 2018
8f4b585
Minor changes
devin-petersohn Apr 10, 2018
3cffd84
Adding aggregate and apply for single strings
devin-petersohn Apr 10, 2018
00b743c
Checkpointing progress
devin-petersohn Apr 10, 2018
1e5bd79
Start toward implementing callables
devin-petersohn Apr 10, 2018
148ef09
Working on agg
devin-petersohn Apr 12, 2018
1999868
Groupby + agg functional for string or callable
devin-petersohn Apr 12, 2018
b6872cb
Begin implementation of groupby methods
kunalgosar Apr 10, 2018
8fd1f56
Implement remaining groupby methods
kunalgosar Apr 13, 2018
8f88ce0
Moving to being more lazy
devin-petersohn Apr 13, 2018
d975118
updating groupby
devin-petersohn Apr 14, 2018
b90548b
Updating remote
devin-petersohn Apr 14, 2018
078911a
Update groupby
devin-petersohn Apr 14, 2018
3fa9bd3
Fixing some performance issues
devin-petersohn Apr 14, 2018
759eeec
Adding list support for reduction tasks
devin-petersohn Apr 15, 2018
e67230f
Removing print
devin-petersohn Apr 15, 2018
4fd1f14
Working on performance debug
devin-petersohn Apr 16, 2018
2e0b367
Working on tuning
devin-petersohn Apr 16, 2018
8871f37
Making lists of functions work for agg and apply
devin-petersohn Apr 16, 2018
0c0ea4f
Cleaning up
devin-petersohn Apr 16, 2018
295f453
Improving serialization of agg
devin-petersohn Apr 16, 2018
3d163a2
implement transform
kunalgosar Apr 16, 2018
796939a
resolve merge artifacts
kunalgosar Apr 16, 2018
c6ba8c8
groupby transform works now
kunalgosar Apr 16, 2018
e592957
temp implementation of __array__
kunalgosar Apr 16, 2018
cd5a346
some error handling and kwargs cleanup
kunalgosar Apr 16, 2018
e0d0e9f
add a todo
kunalgosar Apr 16, 2018
a2d8b32
Updating groupby for utility.
devin-petersohn Apr 17, 2018
534fb94
Cleanup code
devin-petersohn Apr 17, 2018
0ff8a80
Fix lint
devin-petersohn Apr 17, 2018
b12a339
Fixing tests
devin-petersohn Apr 18, 2018
01b29d0
Fix lint
devin-petersohn Apr 18, 2018
02d7116
Fix Python 2 syntax issue.
robertnishihara Apr 20, 2018
d341cb2
Addressing comments
devin-petersohn Apr 20, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Implement remaining groupby methods
  • Loading branch information
kunalgosar authored and devin-petersohn committed Apr 17, 2018
commit 8fd1f5632ba22b0af8dde481e40cf7cea4168655
24 changes: 9 additions & 15 deletions python/ray/dataframe/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,11 +189,9 @@ def aggregate(self, arg, *args, **kwargs):
def last(self, **kwargs):
raise NotImplementedError("Not Yet implemented.")

@property
def mad(self):
raise NotImplementedError("Not Yet implemented.")
return self._apply_function(lambda df: df.mad())

@property
def rank(self):
raise NotImplementedError("Not Yet implemented.")

Expand All @@ -205,20 +203,19 @@ def pad(self, limit=None):
raise NotImplementedError("Not Yet implemented.")

def max(self, **kwargs):
raise NotImplementedError("Not Yet implemented.")
return self._apply_function(lambda df: df.max())

def var(self, ddof=1, *args, **kwargs):
raise NotImplementedError("Not Yet implemented.")
return self._apply_function(lambda df: df.var())

def get_group(self, name, obj=None):
raise NotImplementedError("Not Yet implemented.")

def __len__(self):
raise NotImplementedError("Not Yet implemented.")

@property
def all(self):
raise NotImplementedError("Not Yet implemented.")
return self._apply_function(lambda df: df.all())

def size(self):
return self._apply_function(lambda df: df.size)
Expand All @@ -241,13 +238,13 @@ def ngroup(self, ascending=True):
raise NotImplementedError("Not Yet implemented.")

def nunique(self, dropna=True):
raise NotImplementedError("Not Yet implemented.")
return self._apply_function(lambda df: df.nunique())

def resample(self, rule, *args, **kwargs):
raise NotImplementedError("Not Yet implemented.")

def median(self, **kwargs):
raise NotImplementedError("Not Yet implemented.")
return self._apply_function(lambda df: df.median().astype(int))

def head(self, n=5):
raise NotImplementedError("Not Yet implemented.")
Expand Down Expand Up @@ -278,23 +275,20 @@ def agg_help(df):
columns=[k for k, v in self._iter],
index=self._index)

@property
def cov(self):
raise NotImplementedError("Not Yet implemented.")
return self._apply_function(lambda df: df.cov())

def transform(self, func, *args, **kwargs):
raise NotImplementedError("Not Yet implemented.")

@property
def corr(self):
raise NotImplementedError("Not Yet implemented.")
return self._apply_function(lambda df: df.corr())

@property
def fillna(self):
raise NotImplementedError("Not Yet implemented.")

def count(self):
raise NotImplementedError("Not Yet implemented.")
return self._apply_function(lambda df: df.count())

def pipe(self, func, *args, **kwargs):
raise NotImplementedError("Not Yet implemented.")
Expand Down