-
Notifications
You must be signed in to change notification settings - Fork 670
REFACTOR-#2011: move default_to_pandas in groupby to backend #2041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REFACTOR-#2011: move default_to_pandas in groupby to backend #2041
Conversation
Signed-off-by: ienkovich <ilya.enkovich@intel.com>
Codecov Report
@@ Coverage Diff @@
## master #2041 +/- ##
==========================================
+ Coverage 81.79% 82.21% +0.42%
==========================================
Files 79 79
Lines 9403 9581 +178
==========================================
+ Hits 7691 7877 +186
+ Misses 1712 1704 -8
Continue to review full report at Codecov.
|
| assert isinstance( | ||
| by, type(query_compiler) | ||
| ), "Can only use groupby reduce with another Query Compiler" | ||
| if not isinstance(by, type(query_compiler)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What cases this is working now for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works for all cases we couldn't handle in DataFrame.groupby and transform by to a QueryCompiler. The case of our interest is NYC Taxi query like the following:
df = df.groupby(["b", df["c"].dt.year]).size()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks!
YarShev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ienkovich , LGTM!
…kend (modin-project#2041) Signed-off-by: ienkovich <ilya.enkovich@intel.com>
What do these changes do?
In OmniSci backend we can handle some cases of
groubywherebyholds both column names and series. We need this feature for NYC Taxi benchmark. To utilize backend support we need to movedefault_to_pandascall from API level to the backend.flake8 modinblack --check modingit commit -s