Skip to content

Commit 29c8fb8

Browse files
committed
minor edits
Signed-off-by: Chris Abraham <cjyabraham@gmail.com>
1 parent e885791 commit 29c8fb8

File tree

1 file changed

+1
-3
lines changed

1 file changed

+1
-3
lines changed

_posts/2024-11-13-llama-into-torchtune.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ The idea of knowledge distillation is that a smaller model can achieve better pe
105105
</td>
106106
</tr>
107107
<tr>
108-
<td><a href="https://arxiv.org/pdf/2404.02657">AKL</a>
108+
<td>AKL
109109
</td>
110110
<td>24.4
111111
</td>
@@ -176,15 +176,13 @@ Below is a simplified example of how knowledge distillation differs from supervi
176176
<pre class="highlight">
177177
<code>
178178
model = llama3_2_1b()
179-
teacher_model = llama3_1_8b()
180179
ce_loss = CrossEntropyLoss()
181180
kd_loss = ForwardKLLoss()
182181

183182
tokens, labels = batch["tokens"], batch["labels"]
184183
logits = model(tokens, ...)
185184

186185
loss = ce_loss(logits, labels)
187-
188186
loss.backward()
189187

190188
</code>

0 commit comments

Comments
 (0)