Skip to content

Commit e09ee89

Browse files
authored
Transformer experiment logs (labmlai#130)
1 parent f726210 commit e09ee89

File tree

12 files changed

+383
-372
lines changed

12 files changed

+383
-372
lines changed

docs/activations/fta/experiment.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,9 +70,9 @@
7070
<a href='#section-0'>#</a>
7171
</div>
7272
<h1><a href="index.html">Fuzzy Tiling Activation</a> Experiment</h1>
73+
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/activations/fta/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a> <a href="https://www.comet.ml/labml/fta/69be11f83693407f82a86dcbb232bcfe?experiment-tab=chart&showOutliers=true&smoothing=0&transformY=smoothing&viewId=rlJOpXDGtL8zbkcX66R77P5me&xAxis=step"><img alt="Open In Comet" src="https://images.labml.ai/images/comet.svg?experiment=capsule_networks&file=model"></a></p>
7374
<p>Here we train a transformer that uses <a href="index.html">Fuzzy Tiling Activation</a> in the <a href="../../transformers/feed_forward.html">Feed-Forward Network</a>. We use it for a language model and train it on Tiny Shakespeare dataset for demonstration.</p>
7475
<p>However, this is probably not the ideal task for FTA, and we believe FTA is more suitable for modeling data with continuous variables.</p>
75-
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/activations/fta/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a> <a href="https://www.comet.ml/labml/fta/69be11f83693407f82a86dcbb232bcfe?experiment-tab=chart&showOutliers=true&smoothing=0&transformY=smoothing&viewId=rlJOpXDGtL8zbkcX66R77P5me&xAxis=step"><img alt="Open In Comet" src="https://images.labml.ai/images/comet.svg?experiment=capsule_networks&file=model"></a></p>
7676

7777
</div>
7878
<div class='code'>

docs/activations/fta/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@
7070
<a href='#section-0'>#</a>
7171
</div>
7272
<h1>Fuzzy Tiling Activations (FTA)</h1>
73+
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/activations/fta/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a> <a href="https://www.comet.ml/labml/fta/69be11f83693407f82a86dcbb232bcfe?experiment-tab=chart&showOutliers=true&smoothing=0&transformY=smoothing&viewId=rlJOpXDGtL8zbkcX66R77P5me&xAxis=step"><img alt="Open In Comet" src="https://images.labml.ai/images/comet.svg?experiment=capsule_networks&file=model"></a></p>
7374
<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation/tutorial of <a href="https://papers.labml.ai/paper/aca66d8edc8911eba3db37f65e372566">Fuzzy Tiling Activations: A Simple Approach to Learning Sparse Representations Online</a>.</p>
7475
<p>Fuzzy tiling activations are a form of sparse activations based on binning.</p>
7576
<p>Binning is classification of a scalar value into a bin based on intervals. One problem with binning is that it gives zero gradients for most values (except at the boundary of bins). The other is that binning loses precision if the bin intervals are large.</p>
@@ -89,7 +90,6 @@ <h4>Fuzzy Tiling Activations</h4>
8990
<p>FTA uses this to create soft boundaries between bins.</p>
9091
<p><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.036108em;vertical-align:-0.286108em;"></span><span class="mord coloredeq eqg" style=""><span class="mord" style=""><span class="mord mathnormal" style="">ϕ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.15139200000000003em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqn" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">η</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mopen" style="">(</span><span class="mord coloredeq eqs" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="mclose" style="">)</span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.72777em;vertical-align:-0.08333em;"></span><span class="mord coloredeq eqp" style=""><span class="mord" style="">1</span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.20001em;vertical-align:-0.35001em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em;">I</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.25833100000000003em;"><span style="top:-2.5500000000000003em;margin-left:-0.07847em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.03588em;">η</span><span class="mpunct mtight">,</span><span class="mord mtight">+</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mord"><span class="delimsizing size1">(</span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">max</span><span class="mopen">(</span><span class="mord coloredeq eqj" style=""><span class="mord mathbf" style="">c</span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord coloredeq eqs" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord coloredeq eqo" style=""><span class="mord" style="">0</span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mop">max</span><span class="mopen">(</span><span class="mord coloredeq eqs" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.77777em;vertical-align:-0.08333em;"></span><span class="mord coloredeq eql" style=""><span class="mord mathnormal" style="margin-right:0.03785em">δ</span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.20001em;vertical-align:-0.35001em;"></span><span class="mord coloredeq eqj" style=""><span class="mord mathbf" style="">c</span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord coloredeq eqo" style=""><span class="mord" style="">0</span></span><span class="mclose">)</span><span class="mord"><span class="delimsizing size1">)</span></span></span></span></span></span></p>
9192
<p><a href="experiment.html">Here&#x27;s a simple experiment</a> that uses FTA in a transformer.</p>
92-
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/activations/fta/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a> <a href="https://www.comet.ml/labml/fta/69be11f83693407f82a86dcbb232bcfe?experiment-tab=chart&showOutliers=true&smoothing=0&transformY=smoothing&viewId=rlJOpXDGtL8zbkcX66R77P5me&xAxis=step"><img alt="Open In Comet" src="https://images.labml.ai/images/comet.svg?experiment=capsule_networks&file=model"></a></p>
9393

9494
</div>
9595
<div class='code'>

0 commit comments

Comments
 (0)