Skip to content

Commit 21b6187

Browse files
committed
docs fix
1 parent b660752 commit 21b6187

File tree

3 files changed

+63
-63
lines changed

3 files changed

+63
-63
lines changed

docs/sitemap.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -204,7 +204,7 @@
204204

205205
<url>
206206
<loc>https://nn.labml.ai/normalization/batch_norm/mnist.html</loc>
207-
<lastmod>2021-08-20T16:30:00+00:00</lastmod>
207+
<lastmod>2021-08-21T16:30:00+00:00</lastmod>
208208
<priority>1.00</priority>
209209
</url>
210210

docs/uncertainty/evidence/index.html

Lines changed: 31 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -85,14 +85,14 @@ <h1>Evidential Deep Learning to Quantify Classification Uncertainty</h1>
8585
Paper uses term evidence as a measure of the amount of support
8686
collected from data in favor of a sample to be classified into a certain class.</p>
8787
<p>This corresponds to a <a href="https://en.wikipedia.org/wiki/Dirichlet_distribution">Dirichlet distribution</a>
88-
with parameters $\color{cyan}{\alpha_k} = e_k + 1$, and
89-
$\color{cyan}{\alpha_0} = S = \sum_{k=1}^K \color{cyan}{\alpha_k}$ is known as the Dirichlet strength.
90-
Dirichlet distribution $D(\mathbf{p} \vert \color{cyan}{\mathbf{\alpha}})$
88+
with parameters $\color{orange}{\alpha_k} = e_k + 1$, and
89+
$\color{orange}{\alpha_0} = S = \sum_{k=1}^K \color{orange}{\alpha_k}$ is known as the Dirichlet strength.
90+
Dirichlet distribution $D(\mathbf{p} \vert \color{orange}{\mathbf{\alpha}})$
9191
is a distribution over categorical distribution; i.e. you can sample class probabilities
9292
from a Dirichlet distribution.
93-
The expected probability for class $k$ is $\hat{p}_k = \frac{\color{cyan}{\alpha_k}}{S}$.</p>
93+
The expected probability for class $k$ is $\hat{p}_k = \frac{\color{orange}{\alpha_k}}{S}$.</p>
9494
<p>We get the model to output evidences
95-
<script type="math/tex; mode=display">\mathbf{e} = \color{cyan}{\mathbf{\alpha}} - 1 = f(\mathbf{x} | \Theta)</script>
95+
<script type="math/tex; mode=display">\mathbf{e} = \color{orange}{\mathbf{\alpha}} - 1 = f(\mathbf{x} | \Theta)</script>
9696
for a given input $\mathbf{x}$.
9797
We use a function such as
9898
<a href="https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html">ReLU</a> or a
@@ -116,7 +116,7 @@ <h1>Evidential Deep Learning to Quantify Classification Uncertainty</h1>
116116
</div>
117117
<p><a id="MaximumLikelihoodLoss"></a></p>
118118
<h2>Type II Maximum Likelihood Loss</h2>
119-
<p>The distribution D(\mathbf{p} \vert \color{cyan}{\mathbf{\alpha}}) is a prior on the likelihood
119+
<p>The distribution $D(\mathbf{p} \vert \color{orange}{\mathbf{\alpha}})$ is a prior on the likelihood
120120
$Multi(\mathbf{y} \vert p)$,
121121
and the negative log marginal likelihood is calculated by integrating over class probabilities
122122
$\mathbf{p}$.</p>
@@ -127,11 +127,11 @@ <h2>Type II Maximum Likelihood Loss</h2>
127127
&= -\log \Bigg(
128128
\int
129129
\prod_{k=1}^K p_k^{y_k}
130-
\frac{1}{B(\color{cyan}{\mathbf{\alpha}})}
131-
\prod_{k=1}^K p_k^{\color{cyan}{\alpha_k} - 1}
130+
\frac{1}{B(\color{orange}{\mathbf{\alpha}})}
131+
\prod_{k=1}^K p_k^{\color{orange}{\alpha_k} - 1}
132132
d\mathbf{p}
133133
\Bigg ) \\
134-
&= \sum_{k=1}^K y_k \bigg( \log S - \log \color{cyan}{\alpha_k} \bigg)
134+
&= \sum_{k=1}^K y_k \bigg( \log S - \log \color{orange}{\alpha_k} \bigg)
135135
\end{align}</script>
136136
</p>
137137
</div>
@@ -158,7 +158,7 @@ <h2>Type II Maximum Likelihood Loss</h2>
158158
<div class='section-link'>
159159
<a href='#section-3'>#</a>
160160
</div>
161-
<p>$\color{cyan}{\alpha_k} = e_k + 1$</p>
161+
<p>$\color{orange}{\alpha_k} = e_k + 1$</p>
162162
</div>
163163
<div class='code'>
164164
<div class="highlight"><pre><span class="lineno">90</span> <span class="n">alpha</span> <span class="o">=</span> <span class="n">evidence</span> <span class="o">+</span> <span class="mf">1.</span></pre></div>
@@ -169,7 +169,7 @@ <h2>Type II Maximum Likelihood Loss</h2>
169169
<div class='section-link'>
170170
<a href='#section-4'>#</a>
171171
</div>
172-
<p>$S = \sum_{k=1}^K \color{cyan}{\alpha_k}$</p>
172+
<p>$S = \sum_{k=1}^K \color{orange}{\alpha_k}$</p>
173173
</div>
174174
<div class='code'>
175175
<div class="highlight"><pre><span class="lineno">92</span> <span class="n">strength</span> <span class="o">=</span> <span class="n">alpha</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span></pre></div>
@@ -180,7 +180,7 @@ <h2>Type II Maximum Likelihood Loss</h2>
180180
<div class='section-link'>
181181
<a href='#section-5'>#</a>
182182
</div>
183-
<p>Losses $\mathcal{L}(\Theta) = \sum_{k=1}^K y_k \bigg( \log S - \log \color{cyan}{\alpha_k} \bigg)$</p>
183+
<p>Losses $\mathcal{L}(\Theta) = \sum_{k=1}^K y_k \bigg( \log S - \log \color{orange}{\alpha_k} \bigg)$</p>
184184
</div>
185185
<div class='code'>
186186
<div class="highlight"><pre><span class="lineno">95</span> <span class="n">loss</span> <span class="o">=</span> <span class="p">(</span><span class="n">target</span> <span class="o">*</span> <span class="p">(</span><span class="n">strength</span><span class="o">.</span><span class="n">log</span><span class="p">()[:,</span> <span class="kc">None</span><span class="p">]</span> <span class="o">-</span> <span class="n">alpha</span><span class="o">.</span><span class="n">log</span><span class="p">()))</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span></pre></div>
@@ -217,11 +217,11 @@ <h2>Bayes Risk with Cross Entropy Loss</h2>
217217
&= -\log \Bigg(
218218
\int
219219
\Big[ \sum_{k=1}^K -y_k \log p_k \Big]
220-
\frac{1}{B(\color{cyan}{\mathbf{\alpha}})}
221-
\prod_{k=1}^K p_k^{\color{cyan}{\alpha_k} - 1}
220+
\frac{1}{B(\color{orange}{\mathbf{\alpha}})}
221+
\prod_{k=1}^K p_k^{\color{orange}{\alpha_k} - 1}
222222
d\mathbf{p}
223223
\Bigg ) \\
224-
&= \sum_{k=1}^K y_k \bigg( \psi(S) - \psi( \color{cyan}{\alpha_k} ) \bigg)
224+
&= \sum_{k=1}^K y_k \bigg( \psi(S) - \psi( \color{orange}{\alpha_k} ) \bigg)
225225
\end{align}</script>
226226
</p>
227227
<p>where $\psi(\cdot)$ is the $digamma$ function.</p>
@@ -249,7 +249,7 @@ <h2>Bayes Risk with Cross Entropy Loss</h2>
249249
<div class='section-link'>
250250
<a href='#section-9'>#</a>
251251
</div>
252-
<p>$\color{cyan}{\alpha_k} = e_k + 1$</p>
252+
<p>$\color{orange}{\alpha_k} = e_k + 1$</p>
253253
</div>
254254
<div class='code'>
255255
<div class="highlight"><pre><span class="lineno">136</span> <span class="n">alpha</span> <span class="o">=</span> <span class="n">evidence</span> <span class="o">+</span> <span class="mf">1.</span></pre></div>
@@ -260,7 +260,7 @@ <h2>Bayes Risk with Cross Entropy Loss</h2>
260260
<div class='section-link'>
261261
<a href='#section-10'>#</a>
262262
</div>
263-
<p>$S = \sum_{k=1}^K \color{cyan}{\alpha_k}$</p>
263+
<p>$S = \sum_{k=1}^K \color{orange}{\alpha_k}$</p>
264264
</div>
265265
<div class='code'>
266266
<div class="highlight"><pre><span class="lineno">138</span> <span class="n">strength</span> <span class="o">=</span> <span class="n">alpha</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span></pre></div>
@@ -271,7 +271,7 @@ <h2>Bayes Risk with Cross Entropy Loss</h2>
271271
<div class='section-link'>
272272
<a href='#section-11'>#</a>
273273
</div>
274-
<p>Losses $\mathcal{L}(\Theta) = \sum_{k=1}^K y_k \bigg( \psi(S) - \psi( \color{cyan}{\alpha_k} ) \bigg)$</p>
274+
<p>Losses $\mathcal{L}(\Theta) = \sum_{k=1}^K y_k \bigg( \psi(S) - \psi( \color{orange}{\alpha_k} ) \bigg)$</p>
275275
</div>
276276
<div class='code'>
277277
<div class="highlight"><pre><span class="lineno">141</span> <span class="n">loss</span> <span class="o">=</span> <span class="p">(</span><span class="n">target</span> <span class="o">*</span> <span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">digamma</span><span class="p">(</span><span class="n">strength</span><span class="p">)[:,</span> <span class="kc">None</span><span class="p">]</span> <span class="o">-</span> <span class="n">torch</span><span class="o">.</span><span class="n">digamma</span><span class="p">(</span><span class="n">alpha</span><span class="p">)))</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span></pre></div>
@@ -305,19 +305,19 @@ <h2>Bayes Risk with Squared Error Loss</h2>
305305
&= -\log \Bigg(
306306
\int
307307
\Big[ \sum_{k=1}^K (y_k - p_k)^2 \Big]
308-
\frac{1}{B(\color{cyan}{\mathbf{\alpha}})}
309-
\prod_{k=1}^K p_k^{\color{cyan}{\alpha_k} - 1}
308+
\frac{1}{B(\color{orange}{\mathbf{\alpha}})}
309+
\prod_{k=1}^K p_k^{\color{orange}{\alpha_k} - 1}
310310
d\mathbf{p}
311311
\Bigg ) \\
312312
&= \sum_{k=1}^K \mathbb{E} \Big[ y_k^2 -2 y_k p_k + p_k^2 \Big] \\
313313
&= \sum_{k=1}^K \Big( y_k^2 -2 y_k \mathbb{E}[p_k] + \mathbb{E}[p_k^2] \Big)
314314
\end{align}</script>
315315
</p>
316-
<p>Where <script type="math/tex; mode=display">\mathbb{E}[p_k] = \hat{p}_k = \frac{\color{cyan}{\alpha_k}}{S}</script>
316+
<p>Where <script type="math/tex; mode=display">\mathbb{E}[p_k] = \hat{p}_k = \frac{\color{orange}{\alpha_k}}{S}</script>
317317
is the expected probability when sampled from the Dirichlet distribution
318318
and <script type="math/tex; mode=display">\mathbb{E}[p_k^2] = \mathbb{E}[p_k]^2 + \text{Var}(p_k)</script>
319319
where
320-
<script type="math/tex; mode=display">\text{Var}(p_k) = \frac{\color{cyan}{\alpha_k}(S - \color{cyan}{\alpha_k})}{S^2 (S + 1)}
320+
<script type="math/tex; mode=display">\text{Var}(p_k) = \frac{\color{orange}{\alpha_k}(S - \color{orange}{\alpha_k})}{S^2 (S + 1)}
321321
= \frac{\hat{p}_k(1 - \hat{p}_k)}{S + 1}</script>
322322
is the variance.</p>
323323
<p>This gives,
@@ -355,7 +355,7 @@ <h2>Bayes Risk with Squared Error Loss</h2>
355355
<div class='section-link'>
356356
<a href='#section-15'>#</a>
357357
</div>
358-
<p>$\color{cyan}{\alpha_k} = e_k + 1$</p>
358+
<p>$\color{orange}{\alpha_k} = e_k + 1$</p>
359359
</div>
360360
<div class='code'>
361361
<div class="highlight"><pre><span class="lineno">197</span> <span class="n">alpha</span> <span class="o">=</span> <span class="n">evidence</span> <span class="o">+</span> <span class="mf">1.</span></pre></div>
@@ -366,7 +366,7 @@ <h2>Bayes Risk with Squared Error Loss</h2>
366366
<div class='section-link'>
367367
<a href='#section-16'>#</a>
368368
</div>
369-
<p>$S = \sum_{k=1}^K \color{cyan}{\alpha_k}$</p>
369+
<p>$S = \sum_{k=1}^K \color{orange}{\alpha_k}$</p>
370370
</div>
371371
<div class='code'>
372372
<div class="highlight"><pre><span class="lineno">199</span> <span class="n">strength</span> <span class="o">=</span> <span class="n">alpha</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span></pre></div>
@@ -377,7 +377,7 @@ <h2>Bayes Risk with Squared Error Loss</h2>
377377
<div class='section-link'>
378378
<a href='#section-17'>#</a>
379379
</div>
380-
<p>$\hat{p}_k = \frac{\color{cyan}{\alpha_k}}{S}$</p>
380+
<p>$\hat{p}_k = \frac{\color{orange}{\alpha_k}}{S}$</p>
381381
</div>
382382
<div class='code'>
383383
<div class="highlight"><pre><span class="lineno">201</span> <span class="n">p</span> <span class="o">=</span> <span class="n">alpha</span> <span class="o">/</span> <span class="n">strength</span><span class="p">[:,</span> <span class="kc">None</span><span class="p">]</span></pre></div>
@@ -435,7 +435,7 @@ <h2>Bayes Risk with Squared Error Loss</h2>
435435
<p><a id="KLDivergenceLoss"></a></p>
436436
<h2>KL Divergence Regularization Loss</h2>
437437
<p>This tries to shrink the total evidence to zero if the sample cannot be correctly classified.</p>
438-
<p>First we calculate $\tilde{\alpha}_k = y_k + (1 - y_k) \color{cyan}{\alpha_k}$ the
438+
<p>First we calculate $\tilde{\alpha}_k = y_k + (1 - y_k) \color{orange}{\alpha_k}$ the
439439
Dirichlet parameters after remove the correct evidence.</p>
440440
<p>
441441
<script type="math/tex; mode=display">\begin{align}
@@ -474,7 +474,7 @@ <h2>KL Divergence Regularization Loss</h2>
474474
<div class='section-link'>
475475
<a href='#section-24'>#</a>
476476
</div>
477-
<p>$\color{cyan}{\alpha_k} = e_k + 1$</p>
477+
<p>$\color{orange}{\alpha_k} = e_k + 1$</p>
478478
</div>
479479
<div class='code'>
480480
<div class="highlight"><pre><span class="lineno">244</span> <span class="n">alpha</span> <span class="o">=</span> <span class="n">evidence</span> <span class="o">+</span> <span class="mf">1.</span></pre></div>
@@ -497,7 +497,7 @@ <h2>KL Divergence Regularization Loss</h2>
497497
<a href='#section-26'>#</a>
498498
</div>
499499
<p>Remove non-misleading evidence
500-
<script type="math/tex; mode=display">\tilde{\alpha}_k = y_k + (1 - y_k) \color{cyan}{\alpha_k}</script>
500+
<script type="math/tex; mode=display">\tilde{\alpha}_k = y_k + (1 - y_k) \color{orange}{\alpha_k}</script>
501501
</p>
502502
</div>
503503
<div class='code'>
@@ -637,7 +637,7 @@ <h3>Track statistics</h3>
637637
<div class='section-link'>
638638
<a href='#section-37'>#</a>
639639
</div>
640-
<p>$\color{cyan}{\alpha_k} = e_k + 1$</p>
640+
<p>$\color{orange}{\alpha_k} = e_k + 1$</p>
641641
</div>
642642
<div class='code'>
643643
<div class="highlight"><pre><span class="lineno">296</span> <span class="n">alpha</span> <span class="o">=</span> <span class="n">evidence</span> <span class="o">+</span> <span class="mf">1.</span></pre></div>
@@ -648,7 +648,7 @@ <h3>Track statistics</h3>
648648
<div class='section-link'>
649649
<a href='#section-38'>#</a>
650650
</div>
651-
<p>$S = \sum_{k=1}^K \color{cyan}{\alpha_k}$</p>
651+
<p>$S = \sum_{k=1}^K \color{orange}{\alpha_k}$</p>
652652
</div>
653653
<div class='code'>
654654
<div class="highlight"><pre><span class="lineno">298</span> <span class="n">strength</span> <span class="o">=</span> <span class="n">alpha</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span></pre></div>
@@ -659,7 +659,7 @@ <h3>Track statistics</h3>
659659
<div class='section-link'>
660660
<a href='#section-39'>#</a>
661661
</div>
662-
<p>$\hat{p}_k = \frac{\color{cyan}{\alpha_k}}{S}$</p>
662+
<p>$\hat{p}_k = \frac{\color{orange}{\alpha_k}}{S}$</p>
663663
</div>
664664
<div class='code'>
665665
<div class="highlight"><pre><span class="lineno">301</span> <span class="n">expected_probability</span> <span class="o">=</span> <span class="n">alpha</span> <span class="o">/</span> <span class="n">strength</span><span class="p">[:,</span> <span class="kc">None</span><span class="p">]</span></pre></div>

0 commit comments

Comments
 (0)