You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cross_val_predict returns an array with the probabilities of each class. The array will have as many columns as classes the target.
Then the code takes the first vector of probabilities, that is the probabilities of the first majority class, and based on that vector it selects the samples to retain.
This is OK in binary classification, but in multiclass, the samples should be filtered out or retained based on their own class probability.
where target_class is the column in the array corresponding to the probability of the sample being undersampled.
In addition, the documentations suggests that IHT supports or implements 1 vs Rest for multiclass targets. But the code in its current format does not use 1 vs Rest. So it is up to the user to be aware of this.
I suggest we either make it clear in the documentation, or implement 1 vs Rest to wrap the algorithm entered by the user.
The text was updated successfully, but these errors were encountered:
It takes the probabilities associated with the true label. The idea will be to keep the samples with high probabilities because it means that this example is easy to be truly classified. Then we loop other classes to iterate and therefore we select samples for each class of interest.
The documentation regarding the multiclass is indeed wrong.
Describe the bug
cross_val_predict returns an array with the probabilities of each class. The array will have as many columns as classes the target.
Then the code takes the first vector of probabilities, that is the probabilities of the first majority class, and based on that vector it selects the samples to retain.
This is OK in binary classification, but in multiclass, the samples should be filtered out or retained based on their own class probability.
Expected Results
where target_class is the column in the array corresponding to the probability of the sample being undersampled.
In addition, the documentations suggests that IHT supports or implements 1 vs Rest for multiclass targets. But the code in its current format does not use 1 vs Rest. So it is up to the user to be aware of this.
I suggest we either make it clear in the documentation, or implement 1 vs Rest to wrap the algorithm entered by the user.
The text was updated successfully, but these errors were encountered: