Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX-#4154: add value_counts method for SeriesGroupBy and DataFrameGroupBy #5453

Merged
merged 1 commit into from
Dec 16, 2022

Conversation

anmyachev
Copy link
Collaborator

@anmyachev anmyachev commented Dec 16, 2022

Signed-off-by: Anatoly Myachev anatoly.myachev@intel.com

What do these changes do?

  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves BUG: SeriesGroupBy has no attribute value_counts #4154
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date

… DataFrameGroupBy

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
@anmyachev anmyachev marked this pull request as ready for review December 16, 2022 15:28
@anmyachev anmyachev requested a review from a team as a code owner December 16, 2022 15:28
@anmyachev anmyachev changed the title FIX-#4154: add value_counts method for SeriesGroupBy and DataFrameGroupBy FIX-#4154: add value_counts method for SeriesGroupBy and DataFrameGroupBy Dec 16, 2022
Copy link
Collaborator

@dchigarev dchigarev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please create an issue to implement this method accordingly?

@anmyachev
Copy link
Collaborator Author

could you please create an issue to implement this method accordingly?

Done #5460

@dchigarev dchigarev merged commit 252665e into modin-project:master Dec 16, 2022
jkew pushed a commit to jkew/modin that referenced this pull request Feb 28, 2023
… DataFrameGroupBy (modin-project#5453)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
jkew added a commit to jkew/modin that referenced this pull request Feb 28, 2023
jkew pushed a commit to jkew/modin that referenced this pull request Feb 28, 2023
… DataFrameGroupBy (modin-project#5453)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
jkew added a commit to jkew/modin that referenced this pull request Feb 28, 2023
jkew pushed a commit to ponder-org/modin-public that referenced this pull request Mar 1, 2023
… DataFrameGroupBy (modin-project#5453)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
jkew added a commit to ponder-org/modin-public that referenced this pull request Mar 3, 2023
* FIX-modin-project#4154: add value_counts method for SeriesGroupBy and DataFrameGroupBy (modin-project#5453)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* WIP GroupBy

* Seemingly working w/ Ponder and Modin

* Maintain proper value ordering for single group-by operations

* linter updates from black on the files that have changed

* When normalize is used, or the groupby was performed with as_index=False, default to pandas.
With ponder this will result in a NotImplemented error currently.

* Remove _to_pandas() by implementing sort_index on Series on the service side

* | Object | call | Pushdown? | Compatibility |
| --- | --- | --- | --- |
| DataFrameGroupBy | value_counts() | Full | PASS |
| DataFrameGroupBy | value_counts(ascending=True) |Full | PASS |
| DataFrameGroupBy | value_counts(ascending=False) | Full |PASS |
| DataFrameGroupBy | value_counts(sort=False) |Full | PASS |
| DataFrameGroupBy | value_counts(sort=True) |Full | PASS |
| DataFrameGroupBy | value_counts(normalize=False) |Full | PASS |
| DataFrameGroupBy | value_counts(normalize=True) | | FAIL |
| DataFrame | groupby(as_index=False) |  | FAIL |
| DataFrameGroupBy | value_counts(dropna=False) | Full  | PASS
| DataFrameGroupBy | value_counts(dropna=True) |  |FAIL

$\color{red}{\text{NOTE: For MultiIndex GroupBys the n+1 level index is ignored in the sorting.}}$

---------

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Co-authored-by: Anatoly Myachev <anatoly.myachev@intel.com>
vnlitvinov pushed a commit to vnlitvinov/modin that referenced this pull request Mar 16, 2023
* FIX-modin-project#4154: add value_counts method for SeriesGroupBy and DataFrameGroupBy (modin-project#5453)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* WIP GroupBy

* Seemingly working w/ Ponder and Modin

* Maintain proper value ordering for single group-by operations

* linter updates from black on the files that have changed

* When normalize is used, or the groupby was performed with as_index=False, default to pandas.
With ponder this will result in a NotImplemented error currently.

* Remove _to_pandas() by implementing sort_index on Series on the service side

* | Object | call | Pushdown? | Compatibility |
| --- | --- | --- | --- |
| DataFrameGroupBy | value_counts() | Full | PASS |
| DataFrameGroupBy | value_counts(ascending=True) |Full | PASS |
| DataFrameGroupBy | value_counts(ascending=False) | Full |PASS |
| DataFrameGroupBy | value_counts(sort=False) |Full | PASS |
| DataFrameGroupBy | value_counts(sort=True) |Full | PASS |
| DataFrameGroupBy | value_counts(normalize=False) |Full | PASS |
| DataFrameGroupBy | value_counts(normalize=True) | | FAIL |
| DataFrame | groupby(as_index=False) |  | FAIL |
| DataFrameGroupBy | value_counts(dropna=False) | Full  | PASS
| DataFrameGroupBy | value_counts(dropna=True) |  |FAIL

$\color{red}{\text{NOTE: For MultiIndex GroupBys the n+1 level index is ignored in the sorting.}}$

---------

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Co-authored-by: Anatoly Myachev <anatoly.myachev@intel.com>
vnlitvinov pushed a commit to vnlitvinov/modin that referenced this pull request Mar 16, 2023
* FIX-modin-project#4154: add value_counts method for SeriesGroupBy and DataFrameGroupBy (modin-project#5453)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* WIP GroupBy

* Seemingly working w/ Ponder and Modin

* Maintain proper value ordering for single group-by operations

* linter updates from black on the files that have changed

* When normalize is used, or the groupby was performed with as_index=False, default to pandas.
With ponder this will result in a NotImplemented error currently.

* Remove _to_pandas() by implementing sort_index on Series on the service side

* | Object | call | Pushdown? | Compatibility |
| --- | --- | --- | --- |
| DataFrameGroupBy | value_counts() | Full | PASS |
| DataFrameGroupBy | value_counts(ascending=True) |Full | PASS |
| DataFrameGroupBy | value_counts(ascending=False) | Full |PASS |
| DataFrameGroupBy | value_counts(sort=False) |Full | PASS |
| DataFrameGroupBy | value_counts(sort=True) |Full | PASS |
| DataFrameGroupBy | value_counts(normalize=False) |Full | PASS |
| DataFrameGroupBy | value_counts(normalize=True) | | FAIL |
| DataFrame | groupby(as_index=False) |  | FAIL |
| DataFrameGroupBy | value_counts(dropna=False) | Full  | PASS
| DataFrameGroupBy | value_counts(dropna=True) |  |FAIL

$\color{red}{\text{NOTE: For MultiIndex GroupBys the n+1 level index is ignored in the sorting.}}$

---------

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Co-authored-by: Anatoly Myachev <anatoly.myachev@intel.com>
vnlitvinov pushed a commit to vnlitvinov/modin that referenced this pull request Mar 16, 2023
* FIX-modin-project#4154: add value_counts method for SeriesGroupBy and DataFrameGroupBy (modin-project#5453)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* WIP GroupBy

* Seemingly working w/ Ponder and Modin

* Maintain proper value ordering for single group-by operations

* linter updates from black on the files that have changed

* When normalize is used, or the groupby was performed with as_index=False, default to pandas.
With ponder this will result in a NotImplemented error currently.

* Remove _to_pandas() by implementing sort_index on Series on the service side

* | Object | call | Pushdown? | Compatibility |
| --- | --- | --- | --- |
| DataFrameGroupBy | value_counts() | Full | PASS |
| DataFrameGroupBy | value_counts(ascending=True) |Full | PASS |
| DataFrameGroupBy | value_counts(ascending=False) | Full |PASS |
| DataFrameGroupBy | value_counts(sort=False) |Full | PASS |
| DataFrameGroupBy | value_counts(sort=True) |Full | PASS |
| DataFrameGroupBy | value_counts(normalize=False) |Full | PASS |
| DataFrameGroupBy | value_counts(normalize=True) | | FAIL |
| DataFrame | groupby(as_index=False) |  | FAIL |
| DataFrameGroupBy | value_counts(dropna=False) | Full  | PASS
| DataFrameGroupBy | value_counts(dropna=True) |  |FAIL

$\color{red}{\text{NOTE: For MultiIndex GroupBys the n+1 level index is ignored in the sorting.}}$

---------

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Co-authored-by: Anatoly Myachev <anatoly.myachev@intel.com>
vnlitvinov pushed a commit to vnlitvinov/modin that referenced this pull request Mar 16, 2023
* FIX-modin-project#4154: add value_counts method for SeriesGroupBy and DataFrameGroupBy (modin-project#5453)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* WIP GroupBy

* Seemingly working w/ Ponder and Modin

* Maintain proper value ordering for single group-by operations

* linter updates from black on the files that have changed

* When normalize is used, or the groupby was performed with as_index=False, default to pandas.
With ponder this will result in a NotImplemented error currently.

* Remove _to_pandas() by implementing sort_index on Series on the service side

* | Object | call | Pushdown? | Compatibility |
| --- | --- | --- | --- |
| DataFrameGroupBy | value_counts() | Full | PASS |
| DataFrameGroupBy | value_counts(ascending=True) |Full | PASS |
| DataFrameGroupBy | value_counts(ascending=False) | Full |PASS |
| DataFrameGroupBy | value_counts(sort=False) |Full | PASS |
| DataFrameGroupBy | value_counts(sort=True) |Full | PASS |
| DataFrameGroupBy | value_counts(normalize=False) |Full | PASS |
| DataFrameGroupBy | value_counts(normalize=True) | | FAIL |
| DataFrame | groupby(as_index=False) |  | FAIL |
| DataFrameGroupBy | value_counts(dropna=False) | Full  | PASS
| DataFrameGroupBy | value_counts(dropna=True) |  |FAIL

$\color{red}{\text{NOTE: For MultiIndex GroupBys the n+1 level index is ignored in the sorting.}}$

---------

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Co-authored-by: Anatoly Myachev <anatoly.myachev@intel.com>
@anmyachev anmyachev deleted the issue4154 branch March 24, 2023 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: SeriesGroupBy has no attribute value_counts
2 participants