Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] New Feature StratifiedKFold #3109

Merged
merged 24 commits into from
Sep 22, 2022

Conversation

daxiongshu
Copy link
Contributor

Add equivalent of sklearn's StratifiedKFold to cuml.

@daxiongshu daxiongshu requested a review from a team as a code owner November 1, 2020 14:58
@codecov-io
Copy link

Codecov Report

Merging #3109 into branch-0.17 will decrease coverage by 0.20%.
The diff coverage is 11.11%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.17    #3109      +/-   ##
===============================================
- Coverage        59.20%   58.99%   -0.21%     
===============================================
  Files              142      142              
  Lines             8966     9002      +36     
===============================================
+ Hits              5308     5311       +3     
- Misses            3658     3691      +33     
Impacted Files Coverage Δ
python/cuml/preprocessing/model_selection.py 78.20% <11.11%> (-12.20%) ⬇️
...l/_thirdparty/sklearn/preprocessing/_imputation.py 62.09% <0.00%> (-0.41%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e6d413...7162d2a. Read the comment docs.

@GPUtester
Copy link
Contributor

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@dantegd dantegd added 2 - In Progress Currenty a work in progress Cython / Python Cython or Python issue labels Nov 2, 2020
@daxiongshu daxiongshu requested review from a team as code owners December 28, 2020 02:31
@ajschmidt8
Copy link
Member

@daxiongshu, is there any reason this PR is targeting branch-0.17? If not, can you update it to target the latest branch, branch-0.18? Thanks!

@github-actions
Copy link

This PR has been marked stale due to no recent activity in the past 30d. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be marked rotten if there is no activity in the next 60d.

@github-actions
Copy link

This PR has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@daxiongshu daxiongshu changed the base branch from branch-0.17 to branch-22.02 January 25, 2022 02:56
@daxiongshu daxiongshu added non-breaking Non-breaking change feature request New feature or request and removed inactive-30d inactive-90d labels Jan 27, 2022
@ajschmidt8 ajschmidt8 removed the request for review from a team February 1, 2022 14:12
@ajschmidt8
Copy link
Member

Removing ops-codeowners from the required reviews since it doesn't seem there are any file changes that we're responsible for. Feel free to add us back if necessary.

@daxiongshu daxiongshu changed the title [WIP] Fea StratifiedKFold [REVIEW] New Feature StratifiedKFold Feb 16, 2022
@daxiongshu daxiongshu added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 2 - In Progress Currenty a work in progress labels Feb 16, 2022
self.n_splits = n_splits
self.shuffle = shuffle
self.seed = random_state
self.tpb = 64 # threads per bloc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this configurable or why do we save it as an attribute of the class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I was hesitant to make it configurable or not. I think setting it to 64 would be just fine. I'll make the change.

"""
self._check_array_shape(y)
if isinstance(y, cudf.DataFrame) or isinstance(y, cudf.Series):
data = y.values.ravel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the reason we can't use existing output functionality the need for ravel?



@pytest.mark.parametrize("shuffle", [True, False])
@pytest.mark.parametrize("n_splits", [3, 5, 10])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this values? Seems like 3, 5, and 10 are testing essentially the same thing.

Would be good to have tests of invalid input, say 0 splits for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because these values are the most common choices in kaggle competitions. Yes, I agree they mean the same thing. I'll add a separate test for the invalid number of folds.

@dantegd dantegd added 4 - Waiting on Author Waiting for author to respond to review and removed 4 - Waiting on Reviewer Waiting for reviewer to review or respond labels Mar 24, 2022
@daxiongshu
Copy link
Contributor Author

rerun test

@daxiongshu
Copy link
Contributor Author

rerun tests

@daxiongshu daxiongshu changed the base branch from branch-22.04 to branch-22.06 April 1, 2022 03:31
@dantegd dantegd changed the base branch from branch-22.06 to branch-22.10 August 31, 2022 17:32
@dantegd
Copy link
Member

dantegd commented Sep 21, 2022

rerun tests

@dantegd
Copy link
Member

dantegd commented Sep 22, 2022

rerun tests

@dantegd
Copy link
Member

dantegd commented Sep 22, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 56bb5a2 into rapidsai:branch-22.10 Sep 22, 2022
jakirkham pushed a commit to jakirkham/cuml that referenced this pull request Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on Author Waiting for author to respond to review Cython / Python Cython or Python issue feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants