Skip to content

Commit 03e3992

Browse files
sdconroxjackx111
andauthored
22931: Feature Influence Rename (#99)
Refactored all mentions of renamed functions from [FeatureInfluenceRenames.xls](https://diveplane-my.sharepoint.com/:x:/g/personal/cmack_howso_com/EV9BqmEDGolAhoqPkKLIld0BMIyVFwH8EDo2pt8-78XqHw?e=rroB5A) --------- Co-authored-by: jack-xia-dp <144161208+jackx111@users.noreply.github.com>
1 parent 3a7b2a8 commit 03e3992

File tree

8 files changed

+68
-93
lines changed

8 files changed

+68
-93
lines changed

source/getting_started/concepts.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ advances enable all of Engine's insight and analysis capabilities, including sta
2424

2525
- Outperform other commonly used feature importance metrics, including SHAP.
2626

27-
Howso quantifies individual feature contributions to a prediction, i.e., how much an individual feature impacts a prediction. The concept of feature contribution is similar to the data science concept of "feature importance". However,
27+
Howso quantifies individual prediction contributions to a prediction, i.e., how much an individual feature impacts a prediction. The concept of prediction contributions is similar to the data science concept of "feature importance". However,
2828
Howso is robust against several common challenges (correlated features, redundant features, difference in scale between features, and multiple distinguishing features) faced by other feature importance tools,
2929
including the SHAP metric, which often lead to misleading results.
3030

source/getting_started/intro.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ Howso values gracious intellectual honesty. In that spirit, we're telling you up
6969
- Very large datasets
7070

7171
Handling very large datasets with subtle signals (e.g., datasets requiring tens of millions of records and/or thousands of features to capture the complex relationships within the data)
72-
currently requires manual work from engineering, data science, and subject matter expert teams. However, currently available Howso tools, including ablation and non-robust feature contribution calculations,
72+
currently requires manual work from engineering, data science, and subject matter expert teams. However, currently available Howso tools, including ablation and non-robust prediction contribution calculations,
7373
can be used to help identify subsamples of large datasets that
7474
contain enough signal to be used for data science analysis.
7575

source/getting_started/terminology.rst

Lines changed: 15 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -98,40 +98,39 @@ The mean absolute error between a predicted value and actual value for a predict
9898
uncertainty. Residuals may be for a given prediction, and expected Residuals may be for a given feature, either
9999
globally across the entire model or for a particular prediction.
100100

101-
.. _contribution:
101+
.. _pc:
102102

103-
Contribution
104-
------------
103+
Prediction Contributions (PC)
104+
-----------------------------
105105

106-
Feature contribution is the difference between a prediction in an action feature when each feature or case is
107-
considered versus not considered. Case contribution is the same but for a case rather than a feature. When applied in
106+
Prediction contributions is the measured difference between a prediction in an action feature when each feature (Feature Prediction Contributions)
107+
or case (Case Prediction Contributions) is considered versus not considered. When Feature Prediction Contributions is applied in
108108
a robust fashion, this is an approximation of the commonly used SHAP feature importance measure. The difference being
109109
that SHAP is an exact value of a model (which itself is just an approximation of the data) whereas robust contribution is an
110110
approximation of the feature importance of the relationships expressed in the data.
111111

112-
.. _mda:
112+
.. _ac:
113113

114-
MDA
115-
---
116-
117-
The *Mean Decrease in Accuracy* (MDA) of an Action Feature is mean decrease in accuracy of removing a feature. MDA units are on the same scale as the Action feature(s), and will be probabilities for categorical features.
114+
Accuracy Contributions (AC)
115+
---------------------------
116+
Accuracy contributions is the accuracy difference in an action feature when each feature (Feature Accuracy Contributions)
117+
or case (Case Accuracy Contributions) is considered versus not considered.
118118

119119
.. _robust:
120120

121121
Robust
122122
------
123123

124-
A feature or case contribution or MDA that is robust means that it is computed over the power set of possible
125-
combinations of features or cases, as approximated by a uniform distribution. For feature contributions, robust means
124+
A feature or case contribution that is robust means that it is computed over the power set of possible
125+
combinations of features or cases, as approximated by a uniform distribution. For prediction contributions, robust means
126126
it is an approximation to the well-known SHAP values.
127127

128128
.. _relavant_features:
129129

130130
Relevant Features
131131
-----------------
132132

133-
Features whose values were important in determining prediction value(s). Generally, this refers to feature MDA or
134-
contribution, which yield similar but complementary insights.
133+
Features whose values were important in determining prediction value(s). Generally, this refers to prediction or accuracy contributions, which yield similar but complementary insights.
135134

136135
.. _contexts:
137136

@@ -313,8 +312,8 @@ Influential Cases
313312

314313
The cases which were identified as most influential during a prediction, along with their weights when predicting the
315314
expected value or drawing a value from the distribution of expected values for generative outputs. The influential
316-
cases are a subset of the :ref:`most_similar_cases`, returning only those cases whose cumulative influence weights added in
317-
descending order is below the influential weight threshold.
315+
cases are a subset of the :ref:`most_similar_cases`, returning only those cases whose cumulative influence weights added in
316+
descending order is below the influential weight threshold.
318317

319318
.. _boundary_cases:
320319

source/user_guide/advanced_capabilities/case_importance.rst

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -29,9 +29,8 @@ Concepts & Terminology
2929

3030
How-To Guide
3131
------------
32-
Case importance is similar to feature importance in that it comprises of two metrics, case mean decrease in accuracy (MDA) and case contribution.
33-
As opposed to influential and similar cases which examines the influence of cases on a single case or prediction, case importance examines how important a case is in regards to the overall predictions on a group of cases. Case importance share the same underlying methodology with :doc:`Feature Importance <feature_importance>`.
34-
Unlike feature contributions, case contributions are calculated just locally. Conceptually, local metrics use either a specific subset of the cases that are trained into the Trainee or a set of new cases.
32+
Case importance is similar to feature importance in that it comprises of two metrics, Accuracy Contributions for Case and Prediction Contributions for Case.
33+
Unlike global feature importance metrics, case contributions are calculated just locally. Conceptually, local metrics use either a specific subset of the cases that are trained into the Trainee or a set of new cases.
3534

3635
Setup
3736
^^^^^
@@ -41,19 +40,19 @@ The :class:`~Trainee` will be referenced as ``trainee`` in the sections below.
4140
Case Contributions
4241
^^^^^^^^^^^^^^^^^^
4342

44-
Case contributions can be retrieved by setting ``case_contributions_robust`` or ``case_contributions_full`` to ``True``.
43+
Case contributions can be retrieved by setting ``case_robust_prediction_contributions`` or ``case_full_prediction_contributions`` to ``True``.
4544

4645
.. code-block:: python
4746
48-
details = {'case_contributions_robust': True}
47+
details = {'case_robust_prediction_contributions': True}
4948
50-
Case MDA
51-
^^^^^^^^
52-
Case MDA can be retrieved by setting ``case_mda_robust`` or ``case_mda_full`` to ``True``.
49+
Case Accuracy Contributions
50+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
51+
Case Accuracy Contributions can be retrieved by setting ``case_robust_accuracy_contributions`` or ``case_full_accuracy_contributions`` to ``True``.
5352

5453
.. code-block:: python
5554
56-
details = {'case_mda_robust': True}
55+
details = {'case_robust_accuracy_contributions': True}
5756
5857
5958
React
@@ -75,8 +74,8 @@ The results can be retrieved in the ``details`` section of the results.
7574

7675
.. code-block:: python
7776
78-
case_contributions = pd.DataFrame(results['details']['case_contributions'][0])
79-
case_mda = pd.DataFrame(results['details']['case_mda'][0])
77+
case_prediction_contributions = pd.DataFrame(results['details']['prediction_contributions'][0])
78+
case_accuracy_contributions = pd.DataFrame(results['details']['accuracy_contributions'][0])
8079
8180
8281
Complete Code
@@ -112,7 +111,7 @@ The code from all of the steps in this guide is combined below:
112111
113112
trainee.analyze(context_features=context_features, action_features=action_features)
114113
115-
details = {'case_contributions_robust': True}
114+
details = {'case_robust_prediction_contributions': True}
116115
117116
results = trainee.react(
118117
test_case[context_features],
@@ -121,7 +120,7 @@ The code from all of the steps in this guide is combined below:
121120
details=details
122121
)
123122
124-
case_contributions = pd.DataFrame(results['details']['case_contributions_robust'][0])
123+
case_contributions = pd.DataFrame(results['details']['case_robust_prediction_contributions'][0])
125124
126125
API References
127126
--------------

source/user_guide/advanced_capabilities/feature_importance.rst

Lines changed: 24 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,13 @@ Feature Importance
55
==================
66
.. topic:: What is covered in this user guide
77

8-
In this guide, you will learn how to compute the feature importance metrics, :ref:`Feature Contributions <contribution>` and :ref:`Feature Mean Decrease in Accuracy (MDA) <mda>` from a Trainee. Feature importance metrics
8+
In this guide, you will learn how to compute the feature importance metrics, :ref:`Prediction Contributions (PC) <pc>` and :ref:`Accuracy Contributions (AC) <ac>` from a Trainee. Feature importance metrics
99
provides information about which features are useful for predicting a target or :ref:`action <action_features>` feature. In addition to learning informative metrics about the data and the model, these insights can be used as guidance for further action such as feature selection or feature engineering.
1010

1111

1212
Objectives: what you will take away
1313
-----------------------------------
14-
- **How-To** Retrieve the different types of feature importance metrics across several different categories: :doc:`global vs local <../concepts/global_vs_local>`, and :ref:`robust` vs non-robust (full) :ref:`Feature Contributions <contribution>` and :ref:`Feature MDA <mda>`.
14+
- **How-To** Retrieve the different types of feature importance metrics across several different categories: :doc:`global vs local <../concepts/global_vs_local>`, and :ref:`robust` vs non-robust (full) :ref:`Prediction Contributions <pc>` and :ref:`Accuracy Contributions <ac>`.
1515

1616

1717
Prerequisites: before you begin
@@ -33,9 +33,9 @@ recommend being familiar with the following concepts:
3333
- :ref:`residual`
3434
- :ref:`robust`
3535
- :ref:`contribution`
36-
- :ref:`mda`
36+
- :ref:`ac`
3737

38-
The two metrics available for feature importance is feature :ref:`contribution` and feature :ref:`mda`.
38+
The two metrics available for feature importance is feature :ref:`contribution` and feature :ref:`ac`.
3939

4040
Robust vs Non-Robust (Full)
4141
^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -52,34 +52,34 @@ The created :class:`~Trainee` will be referenced as ``trainee`` in the sections
5252
Global Feature Importance
5353
^^^^^^^^^^^^^^^^^^^^^^^^^
5454
To get global feature importance metrics, :py:meth:`Trainee.react_aggregate`, is called on a trained and analyzed Trainee. :py:meth:`Trainee.react_aggregate` calls react internally on the cases already trained into the Trainee and calculates the metrics. In this method, the desired metrics can be selected as parameters. These parameters are named individually
55-
in the ``details`` parameter and setting them to ``True`` will calculate and return the desired metrics. For example, ``feature_mda_robust`` and ``feature_contributions_robust`` will calculate the robust versions of MDA and Feature Contributions, while ``feature_mda_full`` and ``feature_contributions_full`` will calculate the non-robust (full) versions.
56-
An action feature must be specified. ``feature_influences_action_feature`` is recommended for feature influence metrics such as feature contributions and mda, especially when used in conjunction with retrieving prediction stats, however, ``action_feature`` can be also used as well. ``action_feature`` sets the action feature for both influence metrics and prediction stats. Since often
55+
in the ``details`` parameter and setting them to ``True`` will calculate and return the desired metrics. For example, ``feature_robust_accuracy_contributions`` and ``feature_robust_prediction_contributions`` will calculate the robust versions of Accuracy Contributions and Prediction Contributions, while ``feature_full_accuracy_contributions`` and ``feature_full_prediction_contributions`` will calculate the non-robust (full) versions.
56+
An action feature must be specified. ``feature_influences_action_feature`` is recommended for feature influence metrics such as prediction contributions and accuracy contributions, especially when used in conjunction with retrieving prediction stats, however, ``action_feature`` can be also used as well. ``action_feature`` sets the action feature for both influence metrics and prediction stats. Since often
5757
only the influence metrics's action feature is intended to be set, ``feature_influences_action_feature`` provides a more precise parameter.
5858

5959
.. code-block:: python
6060
61-
feature_contributions_robust = trainee.react_aggregate(
61+
feature_robust_prediction_contributions = trainee.react_aggregate(
6262
context_features=context_features,
6363
feature_influences_action_feature=action_features[0],
64-
details={'feature_contributions_robust' : True}
64+
details={'feature_robust_prediction_contributions' : True}
6565
)
6666
67-
feature_mda_robust = trainee.react_aggregate(
67+
feature_robust_accuracy_contributions = trainee.react_aggregate(
6868
context_features=context_features,
6969
feature_influences_action_feature=action_features[0],
70-
details={'feature_mda_robust': True}
70+
details={'feature_robust_accuracy_contributions': True}
7171
)
7272
7373
Local Feature Importance
7474
^^^^^^^^^^^^^^^^^^^^^^^^
75-
To get local feature importance metrics, :py:meth:`Trainee.react`, is first called on a trained and analyzed Trainee. In this method, the desired metrics, ``feature_contributions_robust`` and ``feature_mda_robust``, can be selected as inputs to the ``details`` parameters as key value pairs from a dictionary. These parameters are named individually
75+
To get local feature importance metrics, :py:meth:`Trainee.react`, is first called on a trained and analyzed Trainee. In this method, the desired metrics, ``feature_robust_prediction_contributions`` and ``feature_robust_accuracy_contributions``, can be selected as inputs to the ``details`` parameters as key value pairs from a dictionary. These parameters are named individually
7676
and setting them to ``True`` will calculate the desired metrics. Robust calculations are performed by default.
7777

7878
.. code-block:: python
7979
8080
details = {
81-
'feature_contributions_robust':True,
82-
'feature_mda_robust':True,
81+
'feature_robust_prediction_contributions':True,
82+
'feature_robust_accuracy_contributions':True,
8383
}
8484
8585
results = trainee.react(
@@ -94,31 +94,14 @@ are calculated in :py:meth:`Trainee.react` from the previous step.
9494

9595
.. code-block:: python
9696
97-
feature_contributions_robust = results['explanation']['feature_contributions_robust']
98-
feature_mda_robust = results['explanation']['feature_mda_robust']
97+
feature_robust_prediction_contributions = results['details']['feature_robust_prediction_contributions']
98+
feature_robust_accuracy_contributions = results['details']['feature_robust_accuracy_contributions']
9999
100100
101101
.. warning::
102102

103-
Contributions and MDA are also metrics for cases and not just features, so please be aware when reading other guides that may use those terms.
103+
Accuracy and Prediction Contributions are also metrics for cases and not just features, so please be aware when reading other guides that may use those terms.
104104

105-
Contribution and MDA matrices
106-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
107-
108-
Howso also provides the two metrics in a matrix view, where for each row which represent the action feature, you can identify the contributions of all
109-
the other context features to that prediction. Since these matrices may not be symmetrical, examining the differences between the upper and lower triangular matrices
110-
may reveal additional insights. Please see the linked recipe for more information.
111-
112-
:meth:`Trainee.get_contribution_matrix` and :meth:`Trainee.get_mda_matrix` gets these matrices respectively.
113-
114-
.. warning::
115-
116-
Matrices may be computationally expensive.
117-
118-
.. code-block:: python
119-
120-
contrib_matrix = trainee.get_contribution_matrix()
121-
mda_matrix = trainee.get_mda_matrix()
122105

123106
Combined Code
124107
^^^^^^^^^^^^^
@@ -154,21 +137,21 @@ Combined Code
154137
trainee.train(df)
155138
trainee.analyze()
156139
157-
feature_contributions_robust = trainee.react_aggregate(
140+
feature_robust_prediction_contributions = trainee.react_aggregate(
158141
context_features=context_features,
159142
feature_influences_action_feature=action_features[0],
160-
details={"feature_contributions_robust" : True}
143+
details={"feature_robust_prediction_contributions" : True}
161144
)
162145
163-
feature_mda_robust = trainee.react_aggregate(
146+
feature_robust_accuracy_contributions = trainee.react_aggregate(
164147
context_features=context_features,
165148
feature_influences_action_feature=action_features[0],
166-
details={"feature_mda_robust" : True}
149+
details={"feature_robust_accuracy_contributions" : True}
167150
)
168151
169152
details = {
170-
'feature_contributions_robust':True,
171-
'feature_mda_robust':True,
153+
'feature_robust_prediction_contributions':True,
154+
'feature_robust_accuracy_contributions':True,
172155
}
173156
174157
results = trainee.react(
@@ -178,12 +161,8 @@ Combined Code
178161
details=details
179162
)
180163
181-
feature_contributions_robust = results['explanation']['feature_contributions_robust']
182-
feature_mda_robust = results['explanation']['feature_mda_robust']
183-
184-
contrib_matrix = trainee.get_contribution_matrix()
185-
mda_matrix = trainee.get_mda_matrix()
186-
164+
feature_robust_prediction_contributions = results['explanation']['feature_robust_prediction_contributions']
165+
feature_robust_accuracy_contributions = results['explanation']['feature_robust_accuracy_contributions']
187166
188167
API References
189168
--------------
@@ -192,6 +171,4 @@ API References
192171
- :py:meth:`Trainee.analyze`
193172
- :py:meth:`Trainee.react`
194173
- :py:meth:`Trainee.react_aggregate`
195-
- :py:meth:`Trainee.get_contribution_matrix`
196-
- :py:meth:`Trainee.get_mda_matrix`
197174

source/user_guide/basic_capabilities/conviction.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ specific cases in :py:meth:`Trainee.react`
8686
.. code-block:: python
8787
8888
details = {
89-
'feature_residuals_robust': True
89+
'feature_robust_residuals': True
9090
}
9191
9292
results = trainee.react(
@@ -144,7 +144,7 @@ The code from all of the steps in this guide is combined below:
144144
print(familiarity_conviction_addition)
145145
146146
details = {
147-
'feature_residuals_robust': True,
147+
'feature_robust_residuals': True,
148148
'similarity_conviction': True
149149
}
150150
@@ -178,7 +178,7 @@ Below is an example of expected output from this sample code:
178178
target
179179
0 1
180180
{'action_features': ['target'],
181-
'feature_residuals_robust': [{'age': 8.888516681825308,
181+
'feature_robust_residuals': [{'age': 8.888516681825308,
182182
'capital-gain': 416.7392605164004,
183183
'capital-loss': 59.906358535804515,
184184
'education': 0.4523004291045252,

0 commit comments

Comments
 (0)