@@ -85,14 +85,14 @@ <h1>Evidential Deep Learning to Quantify Classification Uncertainty</h1>
85
85
Paper uses term evidence as a measure of the amount of support
86
86
collected from data in favor of a sample to be classified into a certain class.</ p >
87
87
< p > This corresponds to a < a href ="https://en.wikipedia.org/wiki/Dirichlet_distribution "> Dirichlet distribution</ a >
88
- with parameters $\color{cyan }{\alpha_k} = e_k + 1$, and
89
- $\color{cyan }{\alpha_0} = S = \sum_{k=1}^K \color{cyan }{\alpha_k}$ is known as the Dirichlet strength.
90
- Dirichlet distribution $D(\mathbf{p} \vert \color{cyan }{\mathbf{\alpha}})$
88
+ with parameters $\color{orange }{\alpha_k} = e_k + 1$, and
89
+ $\color{orange }{\alpha_0} = S = \sum_{k=1}^K \color{orange }{\alpha_k}$ is known as the Dirichlet strength.
90
+ Dirichlet distribution $D(\mathbf{p} \vert \color{orange }{\mathbf{\alpha}})$
91
91
is a distribution over categorical distribution; i.e. you can sample class probabilities
92
92
from a Dirichlet distribution.
93
- The expected probability for class $k$ is $\hat{p}_k = \frac{\color{cyan }{\alpha_k}}{S}$.</ p >
93
+ The expected probability for class $k$ is $\hat{p}_k = \frac{\color{orange }{\alpha_k}}{S}$.</ p >
94
94
< p > We get the model to output evidences
95
- < script type ="math/tex; mode=display "> \mathbf { e } = \color { cyan } { \mathbf { \alpha } } - 1 = f ( \mathbf { x} | \Theta ) </ script >
95
+ < script type ="math/tex; mode=display "> \mathbf { e } = \color { orange } { \mathbf { \alpha } } - 1 = f ( \mathbf { x} | \Theta ) </ script >
96
96
for a given input $\mathbf{x}$.
97
97
We use a function such as
98
98
< a href ="https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html "> ReLU</ a > or a
@@ -116,7 +116,7 @@ <h1>Evidential Deep Learning to Quantify Classification Uncertainty</h1>
116
116
</ div >
117
117
< p > < a id ="MaximumLikelihoodLoss "> </ a > </ p >
118
118
< h2 > Type II Maximum Likelihood Loss</ h2 >
119
- < p > The distribution D(\mathbf{p} \vert \color{cyan }{\mathbf{\alpha}}) is a prior on the likelihood
119
+ < p > The distribution $ D(\mathbf{p} \vert \color{orange }{\mathbf{\alpha}})$ is a prior on the likelihood
120
120
$Multi(\mathbf{y} \vert p)$,
121
121
and the negative log marginal likelihood is calculated by integrating over class probabilities
122
122
$\mathbf{p}$.</ p >
@@ -127,11 +127,11 @@ <h2>Type II Maximum Likelihood Loss</h2>
127
127
&= - \log \Bigg (
128
128
\int
129
129
\prod_ { k= 1 } ^ K p_k ^ { y_k}
130
- \frac { 1 } { B ( \color { cyan } { \mathbf { \alpha } } ) }
131
- \prod_ { k= 1 } ^ K p_k ^ { \color { cyan } { \alpha_k} - 1 }
130
+ \frac { 1 } { B ( \color { orange } { \mathbf { \alpha } } ) }
131
+ \prod_ { k= 1 } ^ K p_k ^ { \color { orange } { \alpha_k} - 1 }
132
132
d \mathbf { p}
133
133
\Bigg ) \\
134
- &= \sum_ { k= 1 } ^ K y_k \bigg ( \log S - \log \color { cyan } { \alpha_k } \bigg )
134
+ &= \sum_ { k= 1 } ^ K y_k \bigg ( \log S - \log \color { orange } { \alpha_k } \bigg )
135
135
\end { align } </ script >
136
136
</ p >
137
137
</ div >
@@ -158,7 +158,7 @@ <h2>Type II Maximum Likelihood Loss</h2>
158
158
< div class ='section-link '>
159
159
< a href ='#section-3 '> #</ a >
160
160
</ div >
161
- < p > $\color{cyan }{\alpha_k} = e_k + 1$</ p >
161
+ < p > $\color{orange }{\alpha_k} = e_k + 1$</ p >
162
162
</ div >
163
163
< div class ='code '>
164
164
< div class ="highlight "> < pre > < span class ="lineno "> 90</ span > < span class ="n "> alpha</ span > < span class ="o "> =</ span > < span class ="n "> evidence</ span > < span class ="o "> +</ span > < span class ="mf "> 1.</ span > </ pre > </ div >
@@ -169,7 +169,7 @@ <h2>Type II Maximum Likelihood Loss</h2>
169
169
< div class ='section-link '>
170
170
< a href ='#section-4 '> #</ a >
171
171
</ div >
172
- < p > $S = \sum_{k=1}^K \color{cyan }{\alpha_k}$</ p >
172
+ < p > $S = \sum_{k=1}^K \color{orange }{\alpha_k}$</ p >
173
173
</ div >
174
174
< div class ='code '>
175
175
< div class ="highlight "> < pre > < span class ="lineno "> 92</ span > < span class ="n "> strength</ span > < span class ="o "> =</ span > < span class ="n "> alpha</ span > < span class ="o "> .</ span > < span class ="n "> sum</ span > < span class ="p "> (</ span > < span class ="n "> dim</ span > < span class ="o "> =-</ span > < span class ="mi "> 1</ span > < span class ="p "> )</ span > </ pre > </ div >
@@ -180,7 +180,7 @@ <h2>Type II Maximum Likelihood Loss</h2>
180
180
< div class ='section-link '>
181
181
< a href ='#section-5 '> #</ a >
182
182
</ div >
183
- < p > Losses $\mathcal{L}(\Theta) = \sum_{k=1}^K y_k \bigg( \log S - \log \color{cyan }{\alpha_k} \bigg)$</ p >
183
+ < p > Losses $\mathcal{L}(\Theta) = \sum_{k=1}^K y_k \bigg( \log S - \log \color{orange }{\alpha_k} \bigg)$</ p >
184
184
</ div >
185
185
< div class ='code '>
186
186
< div class ="highlight "> < pre > < span class ="lineno "> 95</ span > < span class ="n "> loss</ span > < span class ="o "> =</ span > < span class ="p "> (</ span > < span class ="n "> target</ span > < span class ="o "> *</ span > < span class ="p "> (</ span > < span class ="n "> strength</ span > < span class ="o "> .</ span > < span class ="n "> log</ span > < span class ="p "> ()[:,</ span > < span class ="kc "> None</ span > < span class ="p "> ]</ span > < span class ="o "> -</ span > < span class ="n "> alpha</ span > < span class ="o "> .</ span > < span class ="n "> log</ span > < span class ="p "> ()))</ span > < span class ="o "> .</ span > < span class ="n "> sum</ span > < span class ="p "> (</ span > < span class ="n "> dim</ span > < span class ="o "> =-</ span > < span class ="mi "> 1</ span > < span class ="p "> )</ span > </ pre > </ div >
@@ -217,11 +217,11 @@ <h2>Bayes Risk with Cross Entropy Loss</h2>
217
217
&= - \log \Bigg (
218
218
\int
219
219
\Big [ \sum_ { k= 1 } ^ K - y_k \log p_k \Big ]
220
- \frac { 1 } { B ( \color { cyan } { \mathbf { \alpha } } ) }
221
- \prod_ { k = 1 } ^ K p_k ^ { \color { cyan } { \alpha_k} - 1 }
220
+ \frac { 1 } { B ( \color { orange } { \mathbf { \alpha } } ) }
221
+ \prod_ { k = 1 } ^ K p_k ^ { \color { orange } { \alpha_k} - 1 }
222
222
d \mathbf { p }
223
223
\Bigg ) \\
224
- &= \sum_ { k= 1 } ^ K y_k \bigg ( \psi ( S ) - \psi ( \color { cyan } { \alpha_k } ) \bigg )
224
+ &= \sum_ { k= 1 } ^ K y_k \bigg ( \psi ( S ) - \psi ( \color { orange } { \alpha_k } ) \bigg )
225
225
\end { align } </ script >
226
226
</ p >
227
227
< p > where $\psi(\cdot)$ is the $digamma$ function.</ p >
@@ -249,7 +249,7 @@ <h2>Bayes Risk with Cross Entropy Loss</h2>
249
249
< div class ='section-link '>
250
250
< a href ='#section-9 '> #</ a >
251
251
</ div >
252
- < p > $\color{cyan }{\alpha_k} = e_k + 1$</ p >
252
+ < p > $\color{orange }{\alpha_k} = e_k + 1$</ p >
253
253
</ div >
254
254
< div class ='code '>
255
255
< div class ="highlight "> < pre > < span class ="lineno "> 136</ span > < span class ="n "> alpha</ span > < span class ="o "> =</ span > < span class ="n "> evidence</ span > < span class ="o "> +</ span > < span class ="mf "> 1.</ span > </ pre > </ div >
@@ -260,7 +260,7 @@ <h2>Bayes Risk with Cross Entropy Loss</h2>
260
260
< div class ='section-link '>
261
261
< a href ='#section-10 '> #</ a >
262
262
</ div >
263
- < p > $S = \sum_{k=1}^K \color{cyan }{\alpha_k}$</ p >
263
+ < p > $S = \sum_{k=1}^K \color{orange }{\alpha_k}$</ p >
264
264
</ div >
265
265
< div class ='code '>
266
266
< div class ="highlight "> < pre > < span class ="lineno "> 138</ span > < span class ="n "> strength</ span > < span class ="o "> =</ span > < span class ="n "> alpha</ span > < span class ="o "> .</ span > < span class ="n "> sum</ span > < span class ="p "> (</ span > < span class ="n "> dim</ span > < span class ="o "> =-</ span > < span class ="mi "> 1</ span > < span class ="p "> )</ span > </ pre > </ div >
@@ -271,7 +271,7 @@ <h2>Bayes Risk with Cross Entropy Loss</h2>
271
271
< div class ='section-link '>
272
272
< a href ='#section-11 '> #</ a >
273
273
</ div >
274
- < p > Losses $\mathcal{L}(\Theta) = \sum_{k=1}^K y_k \bigg( \psi(S) - \psi( \color{cyan }{\alpha_k} ) \bigg)$</ p >
274
+ < p > Losses $\mathcal{L}(\Theta) = \sum_{k=1}^K y_k \bigg( \psi(S) - \psi( \color{orange }{\alpha_k} ) \bigg)$</ p >
275
275
</ div >
276
276
< div class ='code '>
277
277
< div class ="highlight "> < pre > < span class ="lineno "> 141</ span > < span class ="n "> loss</ span > < span class ="o "> =</ span > < span class ="p "> (</ span > < span class ="n "> target</ span > < span class ="o "> *</ span > < span class ="p "> (</ span > < span class ="n "> torch</ span > < span class ="o "> .</ span > < span class ="n "> digamma</ span > < span class ="p "> (</ span > < span class ="n "> strength</ span > < span class ="p "> )[:,</ span > < span class ="kc "> None</ span > < span class ="p "> ]</ span > < span class ="o "> -</ span > < span class ="n "> torch</ span > < span class ="o "> .</ span > < span class ="n "> digamma</ span > < span class ="p "> (</ span > < span class ="n "> alpha</ span > < span class ="p "> )))</ span > < span class ="o "> .</ span > < span class ="n "> sum</ span > < span class ="p "> (</ span > < span class ="n "> dim</ span > < span class ="o "> =-</ span > < span class ="mi "> 1</ span > < span class ="p "> )</ span > </ pre > </ div >
@@ -305,19 +305,19 @@ <h2>Bayes Risk with Squared Error Loss</h2>
305
305
&= - \log \Bigg (
306
306
\int
307
307
\Big [ \sum_ { k= 1 } ^ K ( y_k - p_k ) ^ 2 \Big ]
308
- \frac { 1 } { B ( \color { cyan } { \mathbf { \alpha } } ) }
309
- \prod_ { k = 1 } ^ K p_k ^ { \color { cyan } { \alpha_k} - 1 }
308
+ \frac { 1 } { B ( \color { orange } { \mathbf { \alpha } } ) }
309
+ \prod_ { k = 1 } ^ K p_k ^ { \color { orange } { \alpha_k} - 1 }
310
310
d \mathbf { p }
311
311
\Bigg ) \\
312
312
&= \sum_ { k= 1 } ^ K \mathbb { E} \Big [ y_k ^ 2 - 2 y_k p_k + p_k ^ 2 \Big ] \\
313
313
&= \sum_ { k= 1 } ^ K \Big ( y_k ^ 2 - 2 y_k \mathbb { E} [ p_k ] + \mathbb { E} [ p_k ^ 2 ] \Big )
314
314
\end { align } </ script >
315
315
</ p >
316
- < p > Where < script type ="math/tex; mode=display "> \mathbb { E } [ p_k ] = \hat { p} _k = \frac { \color { cyan } { \alpha_k } } { S } </ script >
316
+ < p > Where < script type ="math/tex; mode=display "> \mathbb { E } [ p_k ] = \hat { p} _k = \frac { \color { orange } { \alpha_k } } { S } </ script >
317
317
is the expected probability when sampled from the Dirichlet distribution
318
318
and < script type ="math/tex; mode=display "> \mathbb { E } [ p_k ^ 2 ] = \mathbb { E } [ p_k ] ^ 2 + \text { Var} ( p_k ) </ script >
319
319
where
320
- < script type ="math/tex; mode=display "> \text { Var } ( p_k ) = \frac { \color { cyan } { \alpha_k } ( S - \color { cyan } { \alpha_k } ) } { S ^ 2 ( S + 1 ) }
320
+ < script type ="math/tex; mode=display "> \text { Var } ( p_k ) = \frac { \color { orange } { \alpha_k } ( S - \color { orange } { \alpha_k } ) } { S ^ 2 ( S + 1 ) }
321
321
= \frac { \hat { p } _k ( 1 - \hat { p} _k ) } { S + 1 } </ script >
322
322
is the variance.</ p >
323
323
< p > This gives,
@@ -355,7 +355,7 @@ <h2>Bayes Risk with Squared Error Loss</h2>
355
355
< div class ='section-link '>
356
356
< a href ='#section-15 '> #</ a >
357
357
</ div >
358
- < p > $\color{cyan }{\alpha_k} = e_k + 1$</ p >
358
+ < p > $\color{orange }{\alpha_k} = e_k + 1$</ p >
359
359
</ div >
360
360
< div class ='code '>
361
361
< div class ="highlight "> < pre > < span class ="lineno "> 197</ span > < span class ="n "> alpha</ span > < span class ="o "> =</ span > < span class ="n "> evidence</ span > < span class ="o "> +</ span > < span class ="mf "> 1.</ span > </ pre > </ div >
@@ -366,7 +366,7 @@ <h2>Bayes Risk with Squared Error Loss</h2>
366
366
< div class ='section-link '>
367
367
< a href ='#section-16 '> #</ a >
368
368
</ div >
369
- < p > $S = \sum_{k=1}^K \color{cyan }{\alpha_k}$</ p >
369
+ < p > $S = \sum_{k=1}^K \color{orange }{\alpha_k}$</ p >
370
370
</ div >
371
371
< div class ='code '>
372
372
< div class ="highlight "> < pre > < span class ="lineno "> 199</ span > < span class ="n "> strength</ span > < span class ="o "> =</ span > < span class ="n "> alpha</ span > < span class ="o "> .</ span > < span class ="n "> sum</ span > < span class ="p "> (</ span > < span class ="n "> dim</ span > < span class ="o "> =-</ span > < span class ="mi "> 1</ span > < span class ="p "> )</ span > </ pre > </ div >
@@ -377,7 +377,7 @@ <h2>Bayes Risk with Squared Error Loss</h2>
377
377
< div class ='section-link '>
378
378
< a href ='#section-17 '> #</ a >
379
379
</ div >
380
- < p > $\hat{p}_k = \frac{\color{cyan }{\alpha_k}}{S}$</ p >
380
+ < p > $\hat{p}_k = \frac{\color{orange }{\alpha_k}}{S}$</ p >
381
381
</ div >
382
382
< div class ='code '>
383
383
< div class ="highlight "> < pre > < span class ="lineno "> 201</ span > < span class ="n "> p</ span > < span class ="o "> =</ span > < span class ="n "> alpha</ span > < span class ="o "> /</ span > < span class ="n "> strength</ span > < span class ="p "> [:,</ span > < span class ="kc "> None</ span > < span class ="p "> ]</ span > </ pre > </ div >
@@ -435,7 +435,7 @@ <h2>Bayes Risk with Squared Error Loss</h2>
435
435
< p > < a id ="KLDivergenceLoss "> </ a > </ p >
436
436
< h2 > KL Divergence Regularization Loss</ h2 >
437
437
< p > This tries to shrink the total evidence to zero if the sample cannot be correctly classified.</ p >
438
- < p > First we calculate $\tilde{\alpha}_k = y_k + (1 - y_k) \color{cyan }{\alpha_k}$ the
438
+ < p > First we calculate $\tilde{\alpha}_k = y_k + (1 - y_k) \color{orange }{\alpha_k}$ the
439
439
Dirichlet parameters after remove the correct evidence.</ p >
440
440
< p >
441
441
< script type ="math/tex; mode=display "> \begin { align }
@@ -474,7 +474,7 @@ <h2>KL Divergence Regularization Loss</h2>
474
474
< div class ='section-link '>
475
475
< a href ='#section-24 '> #</ a >
476
476
</ div >
477
- < p > $\color{cyan }{\alpha_k} = e_k + 1$</ p >
477
+ < p > $\color{orange }{\alpha_k} = e_k + 1$</ p >
478
478
</ div >
479
479
< div class ='code '>
480
480
< div class ="highlight "> < pre > < span class ="lineno "> 244</ span > < span class ="n "> alpha</ span > < span class ="o "> =</ span > < span class ="n "> evidence</ span > < span class ="o "> +</ span > < span class ="mf "> 1.</ span > </ pre > </ div >
@@ -497,7 +497,7 @@ <h2>KL Divergence Regularization Loss</h2>
497
497
< a href ='#section-26 '> #</ a >
498
498
</ div >
499
499
< p > Remove non-misleading evidence
500
- < script type ="math/tex; mode=display "> \tilde { \alpha } _k = y_k + ( 1 - y_k ) \color { cyan } { \alpha_k } </ script >
500
+ < script type ="math/tex; mode=display "> \tilde { \alpha } _k = y_k + ( 1 - y_k ) \color { orange } { \alpha_k } </ script >
501
501
</ p >
502
502
</ div >
503
503
< div class ='code '>
@@ -637,7 +637,7 @@ <h3>Track statistics</h3>
637
637
< div class ='section-link '>
638
638
< a href ='#section-37 '> #</ a >
639
639
</ div >
640
- < p > $\color{cyan }{\alpha_k} = e_k + 1$</ p >
640
+ < p > $\color{orange }{\alpha_k} = e_k + 1$</ p >
641
641
</ div >
642
642
< div class ='code '>
643
643
< div class ="highlight "> < pre > < span class ="lineno "> 296</ span > < span class ="n "> alpha</ span > < span class ="o "> =</ span > < span class ="n "> evidence</ span > < span class ="o "> +</ span > < span class ="mf "> 1.</ span > </ pre > </ div >
@@ -648,7 +648,7 @@ <h3>Track statistics</h3>
648
648
< div class ='section-link '>
649
649
< a href ='#section-38 '> #</ a >
650
650
</ div >
651
- < p > $S = \sum_{k=1}^K \color{cyan }{\alpha_k}$</ p >
651
+ < p > $S = \sum_{k=1}^K \color{orange }{\alpha_k}$</ p >
652
652
</ div >
653
653
< div class ='code '>
654
654
< div class ="highlight "> < pre > < span class ="lineno "> 298</ span > < span class ="n "> strength</ span > < span class ="o "> =</ span > < span class ="n "> alpha</ span > < span class ="o "> .</ span > < span class ="n "> sum</ span > < span class ="p "> (</ span > < span class ="n "> dim</ span > < span class ="o "> =-</ span > < span class ="mi "> 1</ span > < span class ="p "> )</ span > </ pre > </ div >
@@ -659,7 +659,7 @@ <h3>Track statistics</h3>
659
659
< div class ='section-link '>
660
660
< a href ='#section-39 '> #</ a >
661
661
</ div >
662
- < p > $\hat{p}_k = \frac{\color{cyan }{\alpha_k}}{S}$</ p >
662
+ < p > $\hat{p}_k = \frac{\color{orange }{\alpha_k}}{S}$</ p >
663
663
</ div >
664
664
< div class ='code '>
665
665
< div class ="highlight "> < pre > < span class ="lineno "> 301</ span > < span class ="n "> expected_probability</ span > < span class ="o "> =</ span > < span class ="n "> alpha</ span > < span class ="o "> /</ span > < span class ="n "> strength</ span > < span class ="p "> [:,</ span > < span class ="kc "> None</ span > < span class ="p "> ]</ span > </ pre > </ div >
0 commit comments