Skip to content

Commit

Permalink
improve formatting
Browse files Browse the repository at this point in the history
  • Loading branch information
rasbt committed Sep 24, 2024
1 parent ff31b34 commit a6d8e93
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,10 @@
"id": "aFmxTQbwCUMl"
},
"source": [
"- In this notebook, we convert the original GPT and GPT-2 architecture into a Llama 2 model step by step\n",
"- In this notebook, we convert the original GPT architecture into a Llama 2 model step by step (note the GPT and GPT-2 share the same architecture)\n",
"- Why not Llama 1 or Llama 3?\n",
" - The Llama 1 architecture is similar to Llama 2, except that Llama 2 has a larger context window (which is nice); the Llama 1 weights are not readily available and have more usage restrictions, so it makes more sense to focus on Llama 2\n",
" - Regarding Llama 3, I will share a separate notebook to convert Llama 2 to Llama 3 (there are only a few small additional changes)\n",
" - The Llama 1 architecture is similar to Llama 2, except that Llama 2 has a larger context window (which is nice); the Llama 1 weights are not readily available and have more usage restrictions, so it makes more sense to focus on Llama 2\n",
" - Regarding Llama 3, I will share a separate notebook to convert Llama 2 to Llama 3 (there are only a few small additional changes)\n",
"- The explanations are purposefully kept minimal in this notebook not to bloat it unnecessarily and focus on the main code\n",
"- For more information, please see the Llama 2 paper: [Llama 2: Open Foundation and Fine-Tuned Chat Models (2023)](https://arxiv.org/abs/2307.09288)"
]
Expand Down Expand Up @@ -143,7 +143,7 @@
"- LayerNorm normalizes inputs using mean and variance, while RMSNorm uses only the root mean square, which improves computational efficiency\n",
"- The RMSNorm operation is as follows, where $x$ is the input $\\gamma$ is a trainable parameter (vector), and $\\epsilon$ is a small constant to avoid zero-division errors:\n",
"\n",
"$$y = \\frac{x}{\\sqrt{\\text{RMS}[x^2]} + \\epsilon} * \\gamma$$\n",
"$$y_i = \\frac{x_i}{\\text{RMS}(x)} \\gamma_i, \\quad \\text{where} \\quad \\text{RMS}(x) = \\sqrt{\\epsilon + \\frac{1}{n} \\sum x_i^2}$$\n",
"\n",
"- For more details, please see the paper [Root Mean Square Layer Normalization (2019)](https://arxiv.org/abs/1910.07467)"
]
Expand Down Expand Up @@ -1565,7 +1565,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
"version": "3.11.4"
}
},
"nbformat": 4,
Expand Down

0 comments on commit a6d8e93

Please sign in to comment.