[REVIEW] New Feature StratifiedKFold #3109

daxiongshu · 2020-11-01T14:58:39Z

Add equivalent of sklearn's StratifiedKFold to cuml.

sync with upstream

codecov-io · 2020-11-01T16:49:02Z

Codecov Report

Merging #3109 into branch-0.17 will decrease coverage by 0.20%.
The diff coverage is 11.11%.

@@               Coverage Diff               @@
##           branch-0.17    #3109      +/-   ##
===============================================
- Coverage        59.20%   58.99%   -0.21%     
===============================================
  Files              142      142              
  Lines             8966     9002      +36     
===============================================
+ Hits              5308     5311       +3     
- Misses            3658     3691      +33

Impacted Files	Coverage Δ
python/cuml/preprocessing/model_selection.py	`78.20% <11.11%> (-12.20%)`	⬇️
...l/_thirdparty/sklearn/preprocessing/_imputation.py	`62.09% <0.00%> (-0.41%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e6d413...7162d2a. Read the comment docs.

GPUtester · 2020-11-01T16:49:33Z

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

sync with upstream

Sync with upstream

Sync with Branch 0.18

ajschmidt8 · 2020-12-30T17:31:08Z

@daxiongshu, is there any reason this PR is targeting branch-0.17? If not, can you update it to target the latest branch, branch-0.18? Thanks!

github-actions · 2021-02-16T20:18:02Z

This PR has been marked stale due to no recent activity in the past 30d. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be marked rotten if there is no activity in the next 60d.

github-actions · 2021-05-17T22:05:10Z

This PR has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates.

…rapidsai-branch-22.02

review-notebook-app · 2022-01-25T02:51:53Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

ajschmidt8 · 2022-02-01T14:12:12Z

Removing ops-codeowners from the required reviews since it doesn't seem there are any file changes that we're responsible for. Feel free to add us back if necessary.

dantegd · 2022-03-24T01:07:23Z

python/cuml/model_selection/_split.py

+        self.n_splits = n_splits
+        self.shuffle = shuffle
+        self.seed = random_state
+        self.tpb = 64  # threads per bloc


Is this configurable or why do we save it as an attribute of the class?

Good catch. I was hesitant to make it configurable or not. I think setting it to 64 would be just fine. I'll make the change.

dantegd · 2022-03-24T01:08:43Z

python/cuml/model_selection/_split.py

+        """
+        self._check_array_shape(y)
+        if isinstance(y, cudf.DataFrame) or isinstance(y, cudf.Series):
+            data = y.values.ravel()


Is the reason we can't use existing output functionality the need for ravel?

dantegd · 2022-03-24T01:10:04Z

python/cuml/test/test_stratified_kfold.py

+
+
+@pytest.mark.parametrize("shuffle", [True, False])
+@pytest.mark.parametrize("n_splits", [3, 5, 10])


Why this values? Seems like 3, 5, and 10 are testing essentially the same thing.

Would be good to have tests of invalid input, say 0 splits for example.

Because these values are the most common choices in kaggle competitions. Yes, I agree they mean the same thing. I'll add a separate test for the invalid number of folds.

daxiongshu · 2022-03-30T02:06:23Z

rerun test

daxiongshu · 2022-03-30T02:09:55Z

rerun tests

dantegd · 2022-09-21T22:54:39Z

rerun tests

dantegd · 2022-09-22T15:26:27Z

rerun tests

dantegd · 2022-09-22T15:26:38Z

@gpucibot merge

Add equivalent of [sklearn's StratifiedKFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html) to `cuml`. Authors: - Jiwei Liu (https://github.com/daxiongshu) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#3109

daxiongshu added 3 commits July 26, 2020 12:49

Merge pull request #15 from rapidsai/branch-0.15

cf87af4

sync with upstream

Merge pull request #18 from rapidsai/branch-0.17

e3b7848

sync with upstream

copy basic codes

7162d2a

daxiongshu requested a review from a team as a code owner November 1, 2020 14:58

dantegd added 2 - In Progress Currenty a work in progress Cython / Python Cython or Python issue labels Nov 2, 2020

daxiongshu added 3 commits November 17, 2020 16:59

Merge pull request #19 from rapidsai/branch-0.17

e6d8ec3

sync with upstream

Merge pull request #20 from rapidsai/branch-0.18

8b1b7c3

Sync with upstream

Merge pull request #21 from daxiongshu/branch-0.18

39a38b3

Sync with Branch 0.18

daxiongshu requested review from a team as code owners December 28, 2020 02:31

github-actions bot added the inactive-30d label Feb 16, 2021

github-actions bot added the inactive-90d label May 17, 2021

daxiongshu added 2 commits January 24, 2022 18:50

Merge branch 'branch-22.02' of https://github.com/rapidsai/cuml into …

fdbbe0d

…rapidsai-branch-22.02

Merge branch 'rapidsai-branch-22.02' into fea_stratified_kfold

dcbbf9d

daxiongshu changed the base branch from branch-0.17 to branch-22.02 January 25, 2022 02:56

daxiongshu added 2 commits January 24, 2022 20:12

add docs

80da002

first test passed

f149b73

daxiongshu added non-breaking Non-breaking change feature request New feature or request and removed inactive-30d inactive-90d labels Jan 27, 2022

copy right year

8907c84

ajschmidt8 removed the request for review from a team February 1, 2022 14:12

Merge branch 'rapidsai:branch-22.04' into fea_stratified_kfold

20b8e49

daxiongshu changed the title ~~[WIP] Fea StratifiedKFold~~ [REVIEW] New Feature StratifiedKFold Feb 16, 2022

daxiongshu added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 2 - In Progress Currenty a work in progress labels Feb 16, 2022

dantegd requested changes Mar 24, 2022

View reviewed changes

dantegd added 4 - Waiting on Author Waiting for author to respond to review and removed 4 - Waiting on Reviewer Waiting for reviewer to review or respond labels Mar 24, 2022

daxiongshu added 6 commits March 28, 2022 12:06

Merge branch 'rapidsai:branch-22.04' into fea_stratified_kfold

46443f9

remove self.tpb

358a6c3

use input_to_cuml_array

bb6cdc7

more parameters

154a0fe

test_num_classes_check

86e612a

fix style

2f73b94

daxiongshu added 2 commits March 29, 2022 22:34

Merge branch 'rapidsai:branch-22.04' into fea_stratified_kfold

666d58b

remove unused func

b94ca51

daxiongshu changed the base branch from branch-22.04 to branch-22.06 April 1, 2022 03:31

Merge branch 'rapidsai:branch-22.06' into fea_stratified_kfold

f6ffad9

dantegd changed the base branch from branch-22.06 to branch-22.10 August 31, 2022 17:32

daxiongshu added 2 commits September 12, 2022 12:50

Merge branch 'rapidsai:branch-22.10' into fea_stratified_kfold

c902d1f

Merge branch 'rapidsai:branch-22.10' into fea_stratified_kfold

101a008

dantegd approved these changes Sep 21, 2022

View reviewed changes

rapids-bot bot merged commit 56bb5a2 into rapidsai:branch-22.10 Sep 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] New Feature StratifiedKFold #3109

[REVIEW] New Feature StratifiedKFold #3109

daxiongshu commented Nov 1, 2020

codecov-io commented Nov 1, 2020

GPUtester commented Nov 1, 2020

ajschmidt8 commented Dec 30, 2020

github-actions bot commented Feb 16, 2021

github-actions bot commented May 17, 2021

review-notebook-app bot commented Jan 25, 2022

ajschmidt8 commented Feb 1, 2022

dantegd Mar 24, 2022

daxiongshu Mar 24, 2022

dantegd Mar 24, 2022

dantegd Mar 24, 2022

daxiongshu Mar 24, 2022

daxiongshu commented Mar 30, 2022

daxiongshu commented Mar 30, 2022

dantegd commented Sep 21, 2022

dantegd commented Sep 22, 2022

dantegd commented Sep 22, 2022



		@pytest.mark.parametrize("shuffle", [True, False])
		@pytest.mark.parametrize("n_splits", [3, 5, 10])

[REVIEW] New Feature StratifiedKFold #3109

[REVIEW] New Feature StratifiedKFold #3109

Conversation

daxiongshu commented Nov 1, 2020

codecov-io commented Nov 1, 2020

Codecov Report

GPUtester commented Nov 1, 2020

ajschmidt8 commented Dec 30, 2020

github-actions bot commented Feb 16, 2021

github-actions bot commented May 17, 2021

review-notebook-app bot commented Jan 25, 2022

ajschmidt8 commented Feb 1, 2022

dantegd Mar 24, 2022

Choose a reason for hiding this comment

daxiongshu Mar 24, 2022

Choose a reason for hiding this comment

dantegd Mar 24, 2022

Choose a reason for hiding this comment

dantegd Mar 24, 2022

Choose a reason for hiding this comment

daxiongshu Mar 24, 2022

Choose a reason for hiding this comment

daxiongshu commented Mar 30, 2022

daxiongshu commented Mar 30, 2022

dantegd commented Sep 21, 2022

dantegd commented Sep 22, 2022

dantegd commented Sep 22, 2022