Skip to content

Commit 262f6be

Browse files
committed
[DOCS] Add total feature importance to regression example (#1379)
1 parent cb72c42 commit 262f6be

File tree

2 files changed

+71
-14
lines changed

2 files changed

+71
-14
lines changed

docs/en/stack/ml/df-analytics/flightdata-regression.asciidoc

Lines changed: 71 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -123,10 +123,10 @@ exclude fields that either contain erroneous data or describe the
123123
`dependent_variable`.
124124
.. Choose a training percent of `90` which means it randomly selects 90% of the
125125
source data for training.
126-
.. If you want to experiment with <<ml-feature-importance,feature importance>>,
127-
specify a value in the advanced configuration options. In this example, we
128-
choose to return a maximum of 5 feature importance values per document. This
129-
option affects the speed of the analysis, so by default it is disabled.
126+
.. If you want to experiment with <<ml-feature-importance,{feat-imp}>>, specify
127+
a value in the advanced configuration options. In this example, we choose to
128+
return a maximum of 5 {feat-imp} values per document. This option affects the
129+
speed of the analysis, so by default it is disabled.
130130
.. Use the default memory limit for the job. If the job requires more than this
131131
amount of memory, it fails to start. If the available memory on the node is
132132
limited, this setting makes it possible to prevent job execution.
@@ -329,16 +329,24 @@ table to show only testing or training data and you can select which fields are
329329
shown in the table. You can also enable histogram charts to get a better
330330
understanding of the distribution of values in your data.
331331

332-
If you chose to calculate feature importance, the destination index also
333-
contains `ml.feature_importance` objects. Every field that is included in the
334-
{reganalysis} (known as a _feature_ of the data point) is assigned a feature
335-
importance value. However, only the most significant values (in this case, the
336-
top 5) are stored in the index. These values indicate which features had the
337-
biggest (positive or negative) impact on each prediction. In {kib}, you can see
338-
this information displayed in the form of a decision plot:
332+
If you chose to calculate {feat-imp}, the destination index also contains
333+
`ml.feature_importance` objects. Every field that is included in the
334+
{reganalysis} (known as a _feature_ of the data point) is assigned a {feat-imp}
335+
value. This value has both a magnitude and a direction (positive or negative),
336+
which indicates how each field affects a particular prediction. Only the most
337+
significant values (in this case, the top 5) are stored in the index. However,
338+
the trained model metadata also contains the average magnitude of the {feat-imp}
339+
values for each field across all the training data. You can view this
340+
summarized information in {kib}:
339341

340342
[role="screenshot"]
341-
image::images/flights-regression-importance.png["A decision plot for feature importance values in {kib}"]
343+
image::images/flights-regression-total-importance.png["Total {feat-imp} values in {kib}"]
344+
345+
You can also see the {feat-imp} values for each individual prediction in the
346+
form of a decision plot:
347+
348+
[role="screenshot"]
349+
image::images/flights-regression-importance.png["A decision plot for {feat-imp} values in {kib}"]
342350

343351
The decision path starts at a baseline, which is the average of the predictions
344352
for all the data points in the training data set. From there, the feature
@@ -350,12 +358,60 @@ delay. This type of information can help you to understand how models arrive at
350358
their predictions. It can also indicate which aspects of your data set are most
351359
influential or least useful when you are training and tuning your model.
352360

353-
If you do not use {kib}, you can see the same information by using the standard
354-
{es} search command to view the results in the destination index.
361+
If you do not use {kib}, you can see summarized {feat-imp} values by using the
362+
{ref}/get-inference.html[get trained model API] and the individual values by
363+
searching the destination index.
355364

356365
.API example
357366
[%collapsible]
358367
====
368+
[source,console]
369+
--------------------------------------------------
370+
GET _ml/inference/model-flight-delays*?include=total_feature_importance
371+
--------------------------------------------------
372+
// TEST[skip:TBD]
373+
374+
The snippet below shows an example of the total feature importance details in
375+
the trained model metadata:
376+
377+
[source,console-result]
378+
----
379+
{
380+
"count" : 1,
381+
"trained_model_configs" : [
382+
{
383+
"model_id" : "model-flight-delays-1601312043770",
384+
...
385+
"metadata" : {
386+
...
387+
"total_feature_importance" : [
388+
{
389+
"feature_name" : "dayOfWeek",
390+
"importance" : {
391+
"mean_magnitude" : 0.38674590521018903, <1>
392+
"min" : -9.42823116446923, <2>
393+
"max" : 8.707461689065173 <3>
394+
}
395+
},
396+
{
397+
"feature_name" : "OriginWeather",
398+
"importance" : {
399+
"mean_magnitude" : 0.18548393012368913,
400+
"min" : -9.079576266629092,
401+
"max" : 5.142479101907649
402+
}
403+
...
404+
----
405+
<1> This value is the average of the absolute {feat-imp} values for the
406+
`dayOfWeek` field across all the training data.
407+
<2> This value is the minimum {feat-imp} value across all the training data for
408+
this field.
409+
<3> This value is the maximum {feat-imp} value across all the training data for
410+
this field.
411+
412+
To see the top {feat-imp} values for each prediction, search the destination
413+
index. For example:
414+
359415
[source,console]
360416
--------------------------------------------------
361417
GET model-flight-delays/_search
@@ -399,6 +455,7 @@ The snippet below shows a part of a document with the annotated results:
399455
}
400456
...
401457
----
458+
402459
====
403460

404461
[[flightdata-regression-evaluate]]
Loading

0 commit comments

Comments
 (0)