|
| 1 | +# Metric Definitions |
| 2 | + |
| 3 | +## Descriptive statistics |
| 4 | + |
| 5 | +### Missing values |
| 6 | +Missing values metric calculates the summary of the number of missing values per feature. Missing values include `NaN` in numeric arrays, `NaN` or `None` in object arrays and `NaT` in datetimelike. |
| 7 | + |
| 8 | +### Non-Missing values |
| 9 | +Non-Missing values metric calculates the summary of the number of non-missing values per feature. Non-Missing values are all values beside `NaN` for numeric arrays, `NaN` or `None` for object arrays and `NaT` for datetimelike. |
| 10 | + |
| 11 | +### Mean or Average value |
| 12 | +Returns the average value per feature excluding `NaN` and `null` values. |
| 13 | + |
| 14 | +### Minimum value |
| 15 | +Returns the minimum value per feature. |
| 16 | + |
| 17 | +### Maximum value |
| 18 | +Returns the maximum value per feature. |
| 19 | + |
| 20 | +### Summary |
| 21 | +Returns the summary of the values per feature. Excludes `NaN` and `null` values during calculations. |
| 22 | + |
| 23 | +### Standard Deviation |
| 24 | +Returns the sample standard deviation per feature normalized by N-1 excluding `NaN` and `null` values during calculations. Formula: |
| 25 | + |
| 26 | +$$ |
| 27 | +σ = \sqrt{Σ(x_i-μ)^2 \over Ν-1} |
| 28 | +$$ |
| 29 | + |
| 30 | +### Variance |
| 31 | +Returns the unbiased variance per feature normalized by N-1 excluding `NaN` and `null` values during calculations. Formula: |
| 32 | + |
| 33 | +$$ |
| 34 | +σ^2 = {Σ(x_i-μ)^2 \over Ν-1} |
| 35 | +$$ |
| 36 | + |
| 37 | +## Evaluation metrics |
| 38 | + |
| 39 | +### Confusion Matrix |
| 40 | +Returns the number of TP, TN, FP and FN. In case of a `multi-class classification` returns the number of TP, TN, FP and FN per class. |
| 41 | + |
| 42 | +A typical example for `binary classification` could be seen below in which: |
| 43 | + |
| 44 | +- 20 observations were correctly classified as positive. |
| 45 | +- 10 observations were incorrectly classified as negative while they were actually positive. |
| 46 | +- 5 observations were incorrectly classified as positive while they were actually negative. |
| 47 | +- 75 observations were correctly classified as negative. |
| 48 | + |
| 49 | +| | Predicted Positive | Predicted Negative | |
| 50 | +|----------------|--------------------|--------------------| |
| 51 | +| Actual Positive | 20 *(TP)* | 10 *(FN)* | |
| 52 | +| Actual Negative | 5 *(FP)* | 75 *(TN)* | |
| 53 | + |
| 54 | +A typical example for `multi-class classification` could be seen below in which: |
| 55 | + |
| 56 | +- 15 observations were correctly classified as Class A. |
| 57 | +- 5 observations were incorrectly classified as Class B while they were actually Class A. |
| 58 | +- 2 observations were incorrectly classified as Class C while they were actually Class A. |
| 59 | +- 4 observations were incorrectly classified as Class A while they were actually Class B. |
| 60 | +- 20 observations were correctly classified as Class B. |
| 61 | +- 3 observations were incorrectly classified as Class C while they were actually Class B. |
| 62 | +- 2 observations were incorrectly classified as Class A while they were actually Class C. |
| 63 | +- 8 observations were incorrectly classified as Class B while they were actually Class C. |
| 64 | +- 25 observations were correctly classified as Class C. |
| 65 | + |
| 66 | +| | Predicted Class A | Predicted Class B | Predicted Class C | |
| 67 | +|----------------|--------------------|--------------------|--------------------| |
| 68 | +| Actual Class A | 15 *(TP_A)* | 5 | 2 | |
| 69 | +| Actual Class B | 4 | 20 *(TP_B)* | 3 | |
| 70 | +| Actual Class C | 2 | 8 | 25 *(TP_C)* | |
| 71 | + |
| 72 | +### Accuracy |
| 73 | +Returns the accuracy classification score. In `multi-class classification`, this function computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true. Formula: |
| 74 | + |
| 75 | +$$ |
| 76 | +accuracy = {(TP + TN) \over (TP + TN + FP + FN)} |
| 77 | +$$ |
| 78 | + |
| 79 | +### Precision |
| 80 | +Returns the precision classification score. In `multi-class classification`, returns the below 3 scores: |
| 81 | + |
| 82 | +- `micro`: Calculate metrics globally by counting the total true positives, false negatives and false positives. |
| 83 | +- `macro`: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account. |
| 84 | +- `weighted`: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall. |
| 85 | + |
| 86 | +Formula: |
| 87 | + |
| 88 | +$$ |
| 89 | +precision = {TP \over (TP + FP)} |
| 90 | +$$ |
| 91 | + |
| 92 | +### Recall |
| 93 | +Returns the recall classification score. In `multi-class classification`, returns the below 3 scores: |
| 94 | + |
| 95 | +- `micro`: Calculate metrics globally by counting the total true positives, false negatives and false positives. |
| 96 | +- `macro`: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account. |
| 97 | +- `weighted`: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall. |
| 98 | + |
| 99 | +Formula: |
| 100 | + |
| 101 | +$$ |
| 102 | +recall = {TP \over (TP + FN)} |
| 103 | +$$ |
| 104 | + |
| 105 | +### F1 score |
| 106 | +Returns the f1 classification score. In `multi-class classification`, returns the below 3 scores: |
| 107 | + |
| 108 | +- `micro`: Calculate metrics globally by counting the total true positives, false negatives and false positives. |
| 109 | +- `macro`: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account. |
| 110 | +- `weighted`: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall. |
| 111 | + |
| 112 | +Formula: |
| 113 | + |
| 114 | +$$ |
| 115 | +F1 = 2 * {(precision * recall) \over (precision + recall)} |
| 116 | +$$ |
| 117 | + |
| 118 | +## Statistical tests and techniques |
| 119 | + |
| 120 | +### Kolmogorov-Smirnov Two Sample test |
| 121 | +When there are two datasets then K-S two sample test can be used to test the agreement between their distributions. The null hypothesis states that there is no difference between the two distributions. Formula: |
| 122 | + |
| 123 | +$$ |
| 124 | +D = Max|{F_a(X)-F_b(X)}| |
| 125 | +$$ |
| 126 | + |
| 127 | +where: |
| 128 | + |
| 129 | +- $a$ = observations from first dataset. |
| 130 | +- $b$ = observations from second dataset. |
| 131 | +- $F_n(X)$ = observed cumulative frequency distribution of a random sample of n observations. |
| 132 | + |
| 133 | +### Chi-squared test |
| 134 | +A chi-square test is a statistical test used to compare 2 datasets. The purpose of this test is to determine if a difference between data of 2 datasets is due to chance, or if it is due to a relationship between the variables you are studying. Formula: |
| 135 | + |
| 136 | +$$ |
| 137 | +x^2 = Σ{(O_i - E_i)^2 \over E_i} |
| 138 | +$$ |
| 139 | + |
| 140 | +where: |
| 141 | + |
| 142 | +- $x^2$ = chi-square |
| 143 | +- $O_i$ = 1st dataset values |
| 144 | +- $E_i$ = 2nd dataset values |
| 145 | + |
| 146 | +### Z-score for independent proportions |
| 147 | +The purpose of the z-test for independent proportions is to compare two independent datasets. Formula: |
| 148 | + |
| 149 | +$$ |
| 150 | +Z = {p_1 - p_2 \over \sqrt{p' q' ({1\over n_1} + {1\over n_2})}} |
| 151 | +$$ |
| 152 | + |
| 153 | +where: |
| 154 | + |
| 155 | +- $Z$ = Z-statistic which is compared to the standard normal deviate |
| 156 | +- $p_1 , p_2$ = two datasets proportions |
| 157 | +- $p'$ = estimated true proportion under the null hypothesis |
| 158 | +- $q'$ = $(1-p')$ |
| 159 | +- $n_1 , n_2$ = number of observations in two datasets |
| 160 | + |
| 161 | +### Wasserstein distance |
| 162 | +The Wasserstein distance is a metric to describe the distance between the distributions of 2 datasets. Formula: |
| 163 | + |
| 164 | +$$ |
| 165 | +W = ({\int_0^1}{{|{F_A}^{-1}(u) - {F_B}^{-1}(u)|}^2 du} )^{0.5} |
| 166 | +$$ |
| 167 | + |
| 168 | +where: |
| 169 | + |
| 170 | +- $W$ = Wasserstein distance |
| 171 | +- $F_A , F_B$ = corresponding cumulative distribution functions of two datasets |
| 172 | +- ${F_A}^{-1} , {F_B}^{-1}$ = respective quantile functions |
| 173 | + |
| 174 | +### Jensen–Shannon divergence |
| 175 | +The Jensen–Shannon divergence is a method of measuring the similarity between two probability distributions. Formula: |
| 176 | + |
| 177 | +$$ |
| 178 | +JS = 1/2 * KL(P || M) + 1/2 * KL(Q || M) |
| 179 | +$$ |
| 180 | + |
| 181 | +where: |
| 182 | + |
| 183 | +- $JS$ = Jensen–Shannon divergence |
| 184 | +- $KL$ = Kullback-Leibler divergence: $– sum x$ in $X$ $P(x)$ * $log(Q(x) / P(x))$ |
| 185 | +- $P,Q$ = distributions of 2 datasets |
| 186 | +- $M$ = ${1 \over 2} * (P+Q)$ |
0 commit comments