-
Notifications
You must be signed in to change notification settings - Fork 666
FEAT-#1200: pivot_table implementation #1669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1669 +/- ##
==========================================
+ Coverage 82.66% 86.09% +3.42%
==========================================
Files 117 117
Lines 13044 13092 +48
==========================================
+ Hits 10783 11271 +488
+ Misses 2261 1821 -440
Continue to review full report at Codecov.
|
@devin-petersohn I've faced with some problems trying to group by on columns, that have if __name__ == "__main__":
import modin.pandas as pd
import pandas
import numpy as np
pd.DEFAULT_NPARTITIONS = 4
data1 = {
"NaN_col1": [None] + np.arange(255).tolist(),
"NaN_col2": np.arange(128).tolist() + [None] + np.arange(127).tolist(),
"Data_Col": np.arange(256),
}
df = pd.DataFrame(data1)
mask = [df["NaN_col1"]._to_pandas(), df["NaN_col2"]._to_pandas()]
qc = df._query_compiler.groupby_agg(
by=mask,
axis=0,
agg_func=lambda df: df.agg(np.mean),
groupby_args={},
agg_args={},
) # Output:
pandas_res = df._to_pandas().groupby(by=mask).agg(np.mean) #
#
print("md.index == pd.index", all(qc.index == pandas_res.index)) # md.index == pd.index True
#
print("md_len:", len(qc.index.levels[0])) # md_len: 254
print("pd_len:", len(pandas_res.index.levels[0])) # pd_len: 255
#
print("127 in md.levels", 127 in qc.index.levels[0]) # 127 in md.levels False
print("127 in pd.levels", 127 in pandas_res.index.levels[0]) # 127 in pd.levels True Indices itself seems to be equals, but the levels does not. There is nothing special about
It looks like For now, I've let |
47e15c2
to
ac5105c
Compare
The issue described in that comment was fixed by creating the correct index manually in case of Note that this implementation based on using Also, that branch requires fixes from #1727, so that branch also contains from #1727 to make CI green, after #1727 will be merged that branch should be rebased. Inserting multiple columns (margins in the pivoted frame) were implemented with For now @devin-petersohn I need your review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's revisit this once stack
and unstack
are finished.
604cbc3
to
ab899a3
Compare
d390177
to
6330984
Compare
If we want to have a parallel implementation of |
One more question. @dchigarev , what problems are occurred if we do row partitions instead of column partitions? |
df.groupby(by=[index]+[columns]).aggregate(aggfunc).unstack(level=[columns]) which requires us to have the whole column at the same place to aggregate it. We also can't apply an aggregation function to every row partition and then for their results, because we can obtain a function, that is not associative (for example |
Yes, I agree with this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @YarShev that we can merge this after the comments, and raise a new issue for next release. We should probably warn the user about the order of the columns mismatching and say that it will be fixed in a future release.
1f8ee52
to
14627f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am okay with the changes as-is. @YarShev feel free to merge if you have no other comments.
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
… axis Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
@dchigarev , please, answer rest comments. After that we will merge the PR if there are no objections. |
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
14627f7
to
90ae94d
Compare
@dchigarev , thanks, LGTM! Please, create an issue to fix the problem related to mismatching of rows and columns in the next release. |
What do these changes do?
flake8 modin
black --check modin
git commit -s