Ensemble of samplers

.. currentmodule:: imblearn.ensemble

Classifier including inner balancing samplers

Bagging classifier

In ensemble classifiers, bagging methods build several estimators on different randomly selected subset of data. In scikit-learn, this classifier is named :class:`~sklearn.ensemble.BaggingClassifier`. However, this classifier does not allow to balance each subset of data. Therefore, when training on imbalanced data set, this classifier will favor the majority classes:

>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_samples=10000, n_features=2, n_informative=2,
...                            n_redundant=0, n_repeated=0, n_classes=3,
...                            n_clusters_per_class=1,
...                            weights=[0.01, 0.05, 0.94], class_sep=0.8,
...                            random_state=0)
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.metrics import balanced_accuracy_score
>>> from sklearn.ensemble import BaggingClassifier
>>> from sklearn.tree import DecisionTreeClassifier
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
>>> bc = BaggingClassifier(base_estimator=DecisionTreeClassifier(),
...                        random_state=0)
>>> bc.fit(X_train, y_train) #doctest: +ELLIPSIS
BaggingClassifier(...)
>>> y_pred = bc.predict(X_test)
>>> balanced_accuracy_score(y_test, y_pred)  # doctest: +ELLIPSIS
0.77...

In :class:`BalancedBaggingClassifier`, each bootstrap sample will be further resampled to achieve the sampling_strategy desired. Therefore, :class:`BalancedBaggingClassifier` takes the same parameters as the scikit-learn :class:`~sklearn.ensemble.BaggingClassifier`. In addition, the sampling is controlled by the parameter sampler or the two parameters sampling_strategy and replacement, if one wants to use the :class:`~imblearn.under_sampling.RandomUnderSampler`:

>>> from imblearn.ensemble import BalancedBaggingClassifier
>>> bbc = BalancedBaggingClassifier(base_estimator=DecisionTreeClassifier(),
...                                 sampling_strategy='auto',
...                                 replacement=False,
...                                 random_state=0)
>>> bbc.fit(X_train, y_train) # doctest: +ELLIPSIS
BalancedBaggingClassifier(...)
>>> y_pred = bbc.predict(X_test)
>>> balanced_accuracy_score(y_test, y_pred)  # doctest: +ELLIPSIS
0.8...

Changing the sampler will give rise to different known implementation :cite:`maclin1997empirical`, :cite:`hido2009roughly`, :cite:`wang2009diversity`. You can refer to the following example shows in practice these different methods: :ref:`sphx_glr_auto_examples_ensemble_plot_bagging_classifier.py`

Forest of randomized trees

:class:`BalancedRandomForestClassifier` is another ensemble method in which each tree of the forest will be provided a balanced bootstrap sample :cite:`chen2004using`. This class provides all functionality of the :class:`~sklearn.ensemble.RandomForestClassifier`:

>>> from imblearn.ensemble import BalancedRandomForestClassifier
>>> brf = BalancedRandomForestClassifier(n_estimators=100, random_state=0)
>>> brf.fit(X_train, y_train) # doctest: +ELLIPSIS
BalancedRandomForestClassifier(...)
>>> y_pred = brf.predict(X_test)
>>> balanced_accuracy_score(y_test, y_pred)  # doctest: +ELLIPSIS
0.8...

Boosting

Several methods taking advantage of boosting have been designed.

:class:`RUSBoostClassifier` randomly under-sample the dataset before to perform a boosting iteration :cite:`seiffert2009rusboost`:

>>> from imblearn.ensemble import RUSBoostClassifier
>>> rusboost = RUSBoostClassifier(n_estimators=200, algorithm='SAMME.R',
...                               random_state=0)
>>> rusboost.fit(X_train, y_train)  # doctest: +ELLIPSIS
RUSBoostClassifier(...)
>>> y_pred = rusboost.predict(X_test)
>>> balanced_accuracy_score(y_test, y_pred)  # doctest: +ELLIPSIS
0...

A specific method which uses :class:`~sklearn.ensemble.AdaBoostClassifier` as learners in the bagging classifier is called "EasyEnsemble". The :class:`EasyEnsembleClassifier` allows to bag AdaBoost learners which are trained on balanced bootstrap samples :cite:`liu2008exploratory`. Similarly to the :class:`BalancedBaggingClassifier` API, one can construct the ensemble as:

>>> from imblearn.ensemble import EasyEnsembleClassifier
>>> eec = EasyEnsembleClassifier(random_state=0)
>>> eec.fit(X_train, y_train) # doctest: +ELLIPSIS
EasyEnsembleClassifier(...)
>>> y_pred = eec.predict(X_test)
>>> balanced_accuracy_score(y_test, y_pred)  # doctest: +ELLIPSIS
0.6...

Examples

:ref:`sphx_glr_auto_examples_ensemble_plot_comparison_ensemble_classifier.py`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ensemble.rst

ensemble.rst

Ensemble of samplers

Classifier including inner balancing samplers

Bagging classifier

Forest of randomized trees

Boosting

Files

ensemble.rst

Latest commit

History

ensemble.rst

File metadata and controls

Ensemble of samplers

Classifier including inner balancing samplers

Bagging classifier

Forest of randomized trees

Boosting