[BUG] IHT always checks the probability of the first class to make the selection

#### Describe the bug

[cross_val_predict](https://github.com/scikit-learn-contrib/imbalanced-learn/blob/f177b05/imblearn/under_sampling/_prototype_selection/_instance_hardness_threshold.py#L148) returns an array with the probabilities of each class. The array will have as many columns as classes the target.

Then the code takes the [first vector of probabilities](https://github.com/scikit-learn-contrib/imbalanced-learn/blob/f177b05/imblearn/under_sampling/_prototype_selection/_instance_hardness_threshold.py#L156), that is the probabilities of the first majority class, and based on that vector it selects the samples to retain.

This is OK in binary classification, but in multiclass, the samples should be filtered out or retained based on their own class probability.


#### Expected Results
```
probabilities = cross_val_predict(
            self.estimator_,
            X,
            y,
            cv=skf,
            n_jobs=self.n_jobs,
            method="predict_proba",
        )
        
        idx_under = np.empty((0,), dtype=int)

        for target_class in np.unique(y):
            if target_class in self.sampling_strategy_.keys():

               probs = probabilities[range(len(y)), **target_class**] <==

                n_samples = self.sampling_strategy_[target_class]

                threshold = np.percentile(
                    probs[y == target_class],
                    (1.0 - (n_samples / target_stats[target_class])) * 100.0,
                )
                index_target_class = np.flatnonzero(
                    probs[y == target_class] >= threshold
                )
            else:
```

where **target_class** is the column in the array corresponding to the probability of the sample being undersampled.


In addition, the documentations suggests that IHT supports or implements 1 vs Rest for multiclass targets. But the [code in its current format](https://github.com/scikit-learn-contrib/imbalanced-learn/blob/f177b05/imblearn/under_sampling/_prototype_selection/_instance_hardness_threshold.py#L140) does not use 1 vs Rest. So it is up to the user to be aware of this.

I suggest we either make it clear in the documentation, or implement 1 vs Rest to wrap the algorithm entered by the user.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] IHT always checks the probability of the first class to make the selection #848

Describe the bug

Expected Results

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] IHT always checks the probability of the first class to make the selection #848

Description

Describe the bug

Expected Results

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions