-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX multiply by random number < 0.5 for BorderlineSMOTE-2 #1027
FIX multiply by random number < 0.5 for BorderlineSMOTE-2 #1027
Conversation
@solegalli Do the fix looks okay to you? I still find that the paper is ambiguous. Indeed, you could apply a random number for each feature of the Currently, we generate samples on the segments (the SMOTE paper is rather puzzling about this). Any thoughts? |
Co-authored-by: Soledad Galli <solegalli@protonmail.com>
Co-authored-by: Soledad Galli <solegalli@protonmail.com>
|
||
Returns | ||
------- | ||
X_new : {ndarray, sparse matrix} of shape (n_samples, n_features) | ||
Synthetically generated samples. | ||
""" | ||
diffs = nn_data[nn_num[rows, cols]] - X[rows] | ||
if y is not None: # only entering for BorderlineSMOTE-2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, clever implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thoughts, would it not be enough to just half the diffs if we are multiplying it by steps in 186/188?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The paper states to use a random number. If we take half, we always use 0.5.
Hey @glemaitre I agree the paper is vague for bordeline 2. The current code reflects what I also understand from the article. Thank you! |
Address issue raised in https://github.com/scikit-learn-contrib/imbalanced-learn/pull/1023/files#r1259422379
We additionally detect if the pair of samples used to generate samples are from different classes. In this case, we multiplied by a random number between 0 and 0.5.