FIX multiply by random number < 0.5 for BorderlineSMOTE-2 #1027

glemaitre · 2023-07-11T09:23:13Z

Address issue raised in https://github.com/scikit-learn-contrib/imbalanced-learn/pull/1023/files#r1259422379

We additionally detect if the pair of samples used to generate samples are from different classes. In this case, we multiplied by a random number between 0 and 0.5.

glemaitre · 2023-07-11T09:29:01Z

@solegalli Do the fix looks okay to you?

I still find that the paper is ambiguous. Indeed, you could apply a random number for each feature of the X selected. However, in this case, you don't generate a sample in the segment defined by the two samples but in the "rectangle" (or hyperrectangle). So it comes back to the same ambiguity regarding SMOTE generation: on the segment or in the hyperrectangle.

Currently, we generate samples on the segments (the SMOTE paper is rather puzzling about this).

Any thoughts?

imblearn/over_sampling/_smote/base.py

Co-authored-by: Soledad Galli <solegalli@protonmail.com>

imblearn/over_sampling/_smote/base.py

Co-authored-by: Soledad Galli <solegalli@protonmail.com>

solegalli · 2023-07-11T10:00:25Z

imblearn/over_sampling/_smote/base.py


        Returns
        -------
        X_new : {ndarray, sparse matrix} of shape (n_samples, n_features)
            Synthetically generated samples.
        """
        diffs = nn_data[nn_num[rows, cols]] - X[rows]
+        if y is not None:  # only entering for BorderlineSMOTE-2


LGTM, clever implementation.

On second thoughts, would it not be enough to just half the diffs if we are multiplying it by steps in 186/188?

The paper states to use a random number. If we take half, we always use 0.5.

solegalli · 2023-07-11T10:01:46Z

Hey @glemaitre I agree the paper is vague for bordeline 2. The current code reflects what I also understand from the article. Thank you!

FIX multiply by random number < 0.5 for BorderlineSMOTE-2

7c8a1af

solegalli reviewed Jul 11, 2023

View reviewed changes

imblearn/over_sampling/_smote/base.py Outdated Show resolved Hide resolved

Update imblearn/over_sampling/_smote/base.py

fe217e2

Co-authored-by: Soledad Galli <solegalli@protonmail.com>

solegalli reviewed Jul 11, 2023

View reviewed changes

imblearn/over_sampling/_smote/base.py Outdated Show resolved Hide resolved

Update imblearn/over_sampling/_smote/base.py

9b9c6da

Co-authored-by: Soledad Galli <solegalli@protonmail.com>

solegalli reviewed Jul 11, 2023

View reviewed changes

glemaitre merged commit ec27259 into scikit-learn-contrib:master Jul 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX multiply by random number < 0.5 for BorderlineSMOTE-2 #1027

FIX multiply by random number < 0.5 for BorderlineSMOTE-2 #1027

glemaitre commented Jul 11, 2023

glemaitre commented Jul 11, 2023

solegalli Jul 11, 2023

solegalli Jul 11, 2023

glemaitre Jul 11, 2023

solegalli commented Jul 11, 2023

FIX multiply by random number < 0.5 for BorderlineSMOTE-2 #1027

FIX multiply by random number < 0.5 for BorderlineSMOTE-2 #1027

Conversation

glemaitre commented Jul 11, 2023

glemaitre commented Jul 11, 2023

solegalli Jul 11, 2023

Choose a reason for hiding this comment

solegalli Jul 11, 2023

Choose a reason for hiding this comment

glemaitre Jul 11, 2023

Choose a reason for hiding this comment

solegalli commented Jul 11, 2023