DOC udpate the datasets used in CondensedNearestNeighbour docstring #1041

eroell · 2023-09-05T15:29:47Z

Reference Issue

What does this implement/fix? Explain your changes.

This is a suggestion to use the same dataset generator sklearn.datasets.make_classification as is used in the doc of e.g. ClusterCentroids instead of the dead sklearn.datasets.fetch_mldata.

Any other comments?

This is a suggestion, if you are interested in a low-cost fix. There may be more interesting datasets which could be used, or ones you find more illustrative.

btw pycodestyle gave me a few warnings of E501 about lines being longer than 79 characters, but I think you have a max line length of 88 in your checks which I think is also fine.

eroell · 2023-09-06T06:41:54Z

It seems some tests have failed, but it seems to me these would be unrelated to the suggested changes.

glemaitre · 2023-09-07T08:55:55Z

Yes the test is failing for numerical issue. 315 / 404 is 0.779 while we expect > 0.78.
Not a big deal for this PR.

glemaitre · 2023-09-07T09:01:30Z

imblearn/under_sampling/_prototype_selection/_condensed_nearest_neighbour.py

@@ -110,17 +110,19 @@ class CondensedNearestNeighbour(BaseCleaningSampler):
    Examples
    --------
    >>> from collections import Counter  # doctest: +SKIP
-    >>> from sklearn.datasets import fetch_mldata  # doctest: +SKIP
+    >>> from sklearn.datasets import make_classification  # doctest: +SKIP


Instead I would prefer that we use fetch_openml and still fetch the pima dataset.
It should look like:

from sklearn.datasets import fetch_openml from sklearn.preprocesing import scale X, y = fetch_openml("diabetes", version=1, return_X_y=True) X = scale(X)

In this manner, we should still deal with the same dataset.

Used this dataset now. As opposed to
Resampled dataset shape Counter({-1: 268, 1: 227})
in the original example, the counts have now changed to
Resampled dataset shape Counter({{'tested_positive': 268, 'tested_negative': 181}}).

It also seems that omitting the scaling does not retrieve the counts in the original example. Might be due to a difference to the original, now not anymore findable dataset - noticed that also labels have been changed.

Not a huge deal (we might not apply exactly the same scaling). But this is closer to the original example. Updating the labels and counts would be enough here.

…_bug

glemaitre · 2023-09-07T14:08:52Z

Thanks @eroell LGTM.

suggested new dataset in doc

9233397

glemaitre reviewed Sep 7, 2023

View reviewed changes

glemaitre changed the title ~~[MRG] suggested new dataset in doc of CondensedNearestNeighbour~~ DOC udpate the datasets used in CondensedNearestNeighbour docstring Sep 7, 2023

eroell and others added 2 commits September 7, 2023 11:39

changes as suggested in review

68e9f1b

Merge branch 'scikit-learn-contrib:master' into doc_bug

ef6560f

eroell marked this pull request as draft September 7, 2023 09:53

eroell added 2 commits September 7, 2023 11:56

updated labels

409c67f

Merge branch 'doc_bug' of github.com:eroell/imbalanced-learn into doc…

b710167

…_bug

eroell marked this pull request as ready for review September 7, 2023 11:07

glemaitre merged commit dcb476d into scikit-learn-contrib:master Sep 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC udpate the datasets used in CondensedNearestNeighbour docstring #1041

DOC udpate the datasets used in CondensedNearestNeighbour docstring #1041

eroell commented Sep 5, 2023

eroell commented Sep 6, 2023

glemaitre commented Sep 7, 2023

glemaitre Sep 7, 2023

eroell Sep 7, 2023 •

edited

Loading

glemaitre Sep 7, 2023

glemaitre commented Sep 7, 2023

DOC udpate the datasets used in CondensedNearestNeighbour docstring #1041

DOC udpate the datasets used in CondensedNearestNeighbour docstring #1041

Conversation

eroell commented Sep 5, 2023

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

eroell commented Sep 6, 2023

glemaitre commented Sep 7, 2023

glemaitre Sep 7, 2023

Choose a reason for hiding this comment

eroell Sep 7, 2023 • edited Loading

Choose a reason for hiding this comment

glemaitre Sep 7, 2023

Choose a reason for hiding this comment

glemaitre commented Sep 7, 2023

eroell Sep 7, 2023 •

edited

Loading