Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX multiply by random number < 0.5 for BorderlineSMOTE-2 #1027

Merged
merged 3 commits into from
Jul 11, 2023

Conversation

glemaitre
Copy link
Member

Address issue raised in https://github.com/scikit-learn-contrib/imbalanced-learn/pull/1023/files#r1259422379

We additionally detect if the pair of samples used to generate samples are from different classes. In this case, we multiplied by a random number between 0 and 0.5.

@glemaitre
Copy link
Member Author

@solegalli Do the fix looks okay to you?

I still find that the paper is ambiguous. Indeed, you could apply a random number for each feature of the X selected. However, in this case, you don't generate a sample in the segment defined by the two samples but in the "rectangle" (or hyperrectangle). So it comes back to the same ambiguity regarding SMOTE generation: on the segment or in the hyperrectangle.

Currently, we generate samples on the segments (the SMOTE paper is rather puzzling about this).

Any thoughts?

Co-authored-by: Soledad Galli <solegalli@protonmail.com>
Co-authored-by: Soledad Galli <solegalli@protonmail.com>

Returns
-------
X_new : {ndarray, sparse matrix} of shape (n_samples, n_features)
Synthetically generated samples.
"""
diffs = nn_data[nn_num[rows, cols]] - X[rows]
if y is not None: # only entering for BorderlineSMOTE-2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, clever implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thoughts, would it not be enough to just half the diffs if we are multiplying it by steps in 186/188?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The paper states to use a random number. If we take half, we always use 0.5.

@solegalli
Copy link
Contributor

Hey @glemaitre I agree the paper is vague for bordeline 2. The current code reflects what I also understand from the article. Thank you!

@glemaitre glemaitre merged commit ec27259 into scikit-learn-contrib:master Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants