@@ -6,15 +6,45 @@ Permutation feature importance
6
6
7
7
.. currentmodule :: sklearn.inspection
8
8
9
- Permutation feature importance is a model inspection technique that can be used
10
- for any :term: `fitted ` :term: `estimator ` when the data is tabular. This is
11
- especially useful for non-linear or opaque :term: `estimators `. The permutation
12
- feature importance is defined to be the decrease in a model score when a single
13
- feature value is randomly shuffled [1 ]_. This procedure breaks the relationship
14
- between the feature and the target, thus the drop in the model score is
15
- indicative of how much the model depends on the feature. This technique
16
- benefits from being model agnostic and can be calculated many times with
17
- different permutations of the feature.
9
+ Permutation feature importance is a model inspection technique that measures the
10
+ contribution of each feature to a :term: `fitted ` model's statistical performance
11
+ on a given tabular dataset. This technique is particularly useful for non-linear
12
+ or opaque :term: `estimators `, and involves randomly shuffling the values of a
13
+ single feature and observing the resulting degradation of the model's score
14
+ [1 ]_. By breaking the relationship between the feature and the target, we
15
+ determine how much the model relies on such particular feature.
16
+
17
+ In the following figures, we observe the effect of permuting features on the correlation
18
+ between the feature and the target and consequently on the model statistical
19
+ performance.
20
+
21
+ .. image :: ../images/permuted_predictive_feature.png
22
+ :align: center
23
+
24
+ .. image :: ../images/permuted_non_predictive_feature.png
25
+ :align: center
26
+
27
+ On the top figure, we observe that permuting a predictive feature breaks the
28
+ correlation between the feature and the target, and consequently the model
29
+ statistical performance decreases. On the bottom figure, we observe that permuting
30
+ a non-predictive feature does not significantly degrade the model statistical performance.
31
+
32
+ One key advantage of permutation feature importance is that it is
33
+ model-agnostic, i.e. it can be applied to any fitted estimator. Moreover, it can
34
+ be calculated multiple times with different permutations of the feature, further
35
+ providing a measure of the variance in the estimated feature importances for the
36
+ specific trained model.
37
+
38
+ The figure below shows the permutation feature importance of a
39
+ :class: `~sklearn.ensemble.RandomForestClassifier ` trained on an augmented
40
+ version of the titanic dataset that contains a `random_cat ` and a `random_num `
41
+ features, i.e. a categrical and a numerical feature that are not correlated in
42
+ any way with the target variable:
43
+
44
+ .. figure :: ../auto_examples/inspection/images/sphx_glr_plot_permutation_importance_002.png
45
+ :target: ../auto_examples/inspection/plot_permutation_importance.html
46
+ :align: center
47
+ :scale: 70
18
48
19
49
.. warning ::
20
50
@@ -74,15 +104,18 @@ highlight which features contribute the most to the generalization power of the
74
104
inspected model. Features that are important on the training set but not on the
75
105
held-out set might cause the model to overfit.
76
106
77
- The permutation feature importance is the decrease in a model score when a single
78
- feature value is randomly shuffled. The score function to be used for the
79
- computation of importances can be specified with the `scoring ` argument,
80
- which also accepts multiple scorers. Using multiple scorers is more computationally
81
- efficient than sequentially calling :func: `permutation_importance ` several times
82
- with a different scorer, as it reuses model predictions.
107
+ The permutation feature importance depends on the score function that is
108
+ specified with the `scoring ` argument. This argument accepts multiple scorers,
109
+ which is more computationally efficient than sequentially calling
110
+ :func: `permutation_importance ` several times with a different scorer, as it
111
+ reuses model predictions.
83
112
84
- An example of using multiple scorers is shown below, employing a list of metrics,
85
- but more input formats are possible, as documented in :ref: `multimetric_scoring `.
113
+ |details-start |
114
+ **Example of permutation feature importance using multiple scorers **
115
+ |details-split |
116
+
117
+ In the example below we use a list of metrics, but more input formats are
118
+ possible, as documented in :ref: `multimetric_scoring `.
86
119
87
120
>>> scoring = [' r2' , ' neg_mean_absolute_percentage_error' , ' neg_mean_squared_error' ]
88
121
>>> r_multi = permutation_importance(
@@ -116,7 +149,9 @@ The ranking of the features is approximately the same for different metrics even
116
149
if the scales of the importance values are very different. However, this is not
117
150
guaranteed and different metrics might lead to significantly different feature
118
151
importances, in particular for models trained for imbalanced classification problems,
119
- for which the choice of the classification metric can be critical.
152
+ for which **the choice of the classification metric can be critical **.
153
+
154
+ |details-end |
120
155
121
156
Outline of the permutation importance algorithm
122
157
-----------------------------------------------
@@ -156,9 +191,9 @@ over low cardinality features such as binary features or categorical variables
156
191
with a small number of possible categories.
157
192
158
193
Permutation-based feature importances do not exhibit such a bias. Additionally,
159
- the permutation feature importance may be computed performance metric on the
160
- model predictions and can be used to analyze any model class (not
161
- just tree-based models).
194
+ the permutation feature importance may be computed with any performance metric
195
+ on the model predictions and can be used to analyze any model class (not just
196
+ tree-based models).
162
197
163
198
The following example highlights the limitations of impurity-based feature
164
199
importance in contrast to permutation-based feature importance:
@@ -168,13 +203,29 @@ Misleading values on strongly correlated features
168
203
-------------------------------------------------
169
204
170
205
When two features are correlated and one of the features is permuted, the model
171
- will still have access to the feature through its correlated feature. This will
172
- result in a lower importance value for both features, where they might
173
- *actually * be important.
206
+ still has access to the latter through its correlated feature. This results in a
207
+ lower reported importance value for both features, though they might *actually *
208
+ be important.
209
+
210
+ The figure below shows the permutation feature importance of a
211
+ :class: `~sklearn.ensemble.RandomForestClassifier ` trained using the
212
+ :ref: `breast_cancer_dataset `, which contains strongly correlated features. A
213
+ naive interpretation would suggest that all features are unimportant:
214
+
215
+ .. figure :: ../auto_examples/inspection/images/sphx_glr_plot_permutation_importance_multicollinear_002.png
216
+ :target: ../auto_examples/inspection/plot_permutation_importance_multicollinear.html
217
+ :align: center
218
+ :scale: 70
219
+
220
+ One way to handle the issue is to cluster features that are correlated and only
221
+ keep one feature from each cluster.
222
+
223
+ .. figure :: ../auto_examples/inspection/images/sphx_glr_plot_permutation_importance_multicollinear_004.png
224
+ :target: ../auto_examples/inspection/plot_permutation_importance_multicollinear.html
225
+ :align: center
226
+ :scale: 70
174
227
175
- One way to handle this is to cluster features that are correlated and only
176
- keep one feature from each cluster. This strategy is explored in the following
177
- example:
228
+ For more details on such strategy, see the example
178
229
:ref: `sphx_glr_auto_examples_inspection_plot_permutation_importance_multicollinear.py `.
179
230
180
231
.. topic :: Examples:
0 commit comments