Skip to content

Commit

Permalink
chore: remove data stability and improve feature stability (#204)
Browse files Browse the repository at this point in the history
* chore: remove data stability and improve feature stability

* chore: format and remove unused argument in compute shap values

* chore: remove comments
  • Loading branch information
crismunoz authored Aug 30, 2024
1 parent 1295849 commit babfb8a
Show file tree
Hide file tree
Showing 26 changed files with 993 additions and 738 deletions.
3 changes: 2 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,9 @@
'.md': 'markdown',
}


nbsphinx_allow_errors = True # Permitir errores en los notebooks
nbsphinx_execute = 'never' # Puede ser 'auto', 'always', o 'never'
nbsphinx_execute = 'auto' # Puede ser 'auto', 'always', o 'never'

html_show_sourcelink = False
# autodoc options
Expand Down
440 changes: 269 additions & 171 deletions docs/source/gallery/tutorials/explainability/demos/local_shap.ipynb

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -924,28 +924,6 @@
"partial_dependencies = compute_partial_dependence(train['X'], features=ranked_importances.feature_names, proxy=proxy)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3.0341481286164855"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from holisticai.explainability.metrics import data_stability\n",
"\n",
"data_stability(local_importances)"
]
},
{
"cell_type": "code",
"execution_count": 24,
Expand Down Expand Up @@ -1296,28 +1274,6 @@
"classification_explainability_metrics(importances, partial_dependencies, conditional_importances, local_importances=local_importances)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"7.725577718075645"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from holisticai.explainability.metrics import data_stability\n",
"\n",
"data_stability(local_importances)"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
73 changes: 29 additions & 44 deletions docs/source/getting_started/explainability/metrics/stability.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,80 +3,65 @@
Stability Metrics
=================

The Data Stability and Feature Stability metrics are designed to evaluate the consistency of feature importance across different instances and features in a dataset. These metrics help quantify the robustness and reliability of feature importance, facilitating better model explainability and transparency.
Feature Stability metrics are designed to evaluate the consistency of feature importance across different instances and features in a dataset. These metrics help quantify the robustness and reliability of feature importance, facilitating better model explainability and transparency.

.. contents:: Table of Contents
:local:
:depth: 1

Data Stability
----------------------
Feature Stability
-----------------

Methodology
~~~~~~~~~~~
The **Data Stability** metric evaluates the consistency of local feature importances across different instances. It measures how much the importances of features vary for different instances in a dataset.
The **Feature Stability** metric measures the stability of individual feature importances across different instances. It focuses on the consistency of the importance of a specific feature throughout the dataset.

Mathematical Representation
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Let :math:`\mathbf{I} = \{I_1, I_2, \ldots, I_n\}` be the set of local feature importances for \( n \) instances, where each \( I_i \) is a vector of feature importances.

1. **Calculation of Spread Divergence:**
Let :math:`\mathbf{I} = \{I_1, I_2, \ldots, I_n\}` be the set of local feature importances for :math:`n` instances, where each :math:`I_i` is a vector of feature importances.

.. math::
1. **Normalization of Data:**
Each vector :math:`I_i` is normalized so that the sum of its elements equals 1:

S_i = \text{spread_divergence}(I_i)
.. math::
2. **Calculation of Data Stability:**

.. math::
\text{Data_Stability} = \text{spread_divergence}(\{S_i \mid i = 1, \ldots, n\})
Interpretation
~~~~~~~~~~~~~~~
- **High value:** Indicates that the feature importances are consistent across instances. This suggests that the model has a uniform understanding of the data, facilitating interpretation and increasing confidence in the model's explanations.
- **Low value:** Indicates that the feature importances vary significantly between instances. This can make the model harder to interpret and reduce confidence in its predictions.
I_{i,j} \leftarrow \frac{I_{i,j}}{\sum_{k=1}^{m} I_{i,k}} \quad \text{for } i = 1, 2, \ldots, n \text{ and } j = 1, 2, \ldots, m
The **Data Stability** metric uses spread divergence to evaluate the stability of feature importances. This divergence measures the dispersion of importances across different instances, providing a quantitative measure of consistency.
where :math:`m` is the number of features.

2. **Computation of Importance Distributions:**
The importance distribution :math:`\mathbf{D}` of features is computed by finding the density distribution of feature importance vectors. This is done by evaluating the proximity of these vectors to a set of synthetic samples generated from a Dirichlet distribution:

Feature Stability
-----------------
.. math::
Methodology
~~~~~~~~~~~~
The **Feature Stability** metric measures the stability of individual feature importances across different instances. It focuses on the consistency of the importance of a specific feature throughout the dataset.
\mathbf{D} = \left( d_1, d_2, \ldots, d_{m} \right)
Mathematical Representation
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Let :math:`\mathbf{I} = \{I_1, I_2, \ldots, I_n\}` be the set of local feature importances for \( n \) instances, where each \( I_i \) is a vector of feature importances.
where :math:`d_j` represents the density estimate for feature :math:`j`.

1. **Normalization and Transposition of Data:**
3. **Calculation of Feature Stability:**
Feature Stability is computed using one of the following strategies:

.. math::
- **Variance Strategy:**
The stability is determined by the ratio of the standard deviation to the maximum density:

\mathbf{I}^T = \begin{pmatrix}
I_{1,1} & I_{1,2} & \cdots & I_{1,n} \\
I_{2,1} & I_{2,2} & \cdots & I_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
I_{m,1} & I_{m,2} & \cdots & I_{m,n}
\end{pmatrix}
.. math::
2. **Calculation of Spread Divergence for Each Feature:**
\textrm{FS} = 1 - \frac{\sigma_D}{\max(D)}
.. math::
where :math:`\sigma_D` represents the standard deviation of the density distribution :math:`\mathbf{D}`.

S_j = \text{spread_divergence}(I_j^T)
- **Entropy Strategy:**
Alternatively, the stability can be computed based on the Jensen-Shannon divergence between the distribution :math:`\mathbf{D}` and a uniform distribution:

3. **Calculation of Feature Stability:**
.. math::
.. math::
\textrm{FS} = 1 - \text{JSD}\left(\mathbf{D} \| \mathbf{U} \right)
\text{Feature_Stability} = \text{spread_divergence}(\{S_j \mid j = 1, \ldots, m\})
where :math:`\mathbf{U}` is the uniform distribution, and :math:`\text{JSD}` denotes the Jensen-Shannon divergence.

Interpretation
~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~
- **High value:** Indicates that the importance of a specific feature is consistent across instances. This suggests that the feature is robust and its relationship with the model's target is reliable.
- **Low value:** Indicates that the importance of a feature varies significantly between instances. This may suggest that the feature is less reliable and its relationship with the model's target may be weak.

The **Feature Stability** metric uses spread divergence to evaluate the stability of individual feature importances. This divergence measures the dispersion of the importances of each feature across different instances, providing a quantitative measure of their consistency.
The **Feature Stability** metric provides a quantitative measure of the consistency of feature importances across different instances by evaluating the dispersion of these importances using either variance-based or entropy-based methods.
1 change: 0 additions & 1 deletion docs/source/reference/explainability/metrics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,5 +26,4 @@
:template: function.rst
:toctree: .generated/

data_stability
feature_stability
Loading

0 comments on commit babfb8a

Please sign in to comment.