fixed some typos

FilippoMB · FilippoMB · commit f99a1f82d360 · 2024-10-17T17:53:57.000+02:00
diff --git a/Diffusers_library.ipynb b/Diffusers_library.ipynb
@@ -293,12 +293,13 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# model"
+    "# model \n",
+    "model"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "17d9afc8",
+   "id": "e34fe6b5",
    "metadata": {
     "slideshow": {
      "slide_type": "subslide"
@@ -499,8 +500,8 @@
     }
    },
    "source": [
-    "- Run the ``from_config()`` method to load a configuration and instantiate a scheduler \n",
-    "- Note that for pipes and models we did something similar using ``from_pretrained()`` instead"
+    "- Run the ``from_pretrained()`` method to load a configuration and instantiate a scheduler \n",
+    "- Note that we did something similar to load pipes and models"
    ]
   },
   {
@@ -624,7 +625,7 @@
    ],
    "source": [
     "less_noisy_sample = scheduler.step(\n",
-    "    model_output=noisy_residual, timestep=13, sample=noisy_sample\n",
+    "    model_output=noisy_residual, timestep=12, sample=noisy_sample\n",
     ")[\"prev_sample\"]\n",
     "less_noisy_sample.shape"
    ]
@@ -1377,7 +1378,7 @@
     }
    },
    "source": [
-    "Tokenizer + test-encoder:\n",
+    "Tokenizer + text-encoder:\n",
     "\n",
     "- The **text-encoder** is responsible for transforming a text prompt into an embedding space that can be understood by the U-Net \n",
     "- It is usually a transformer-based encoder that maps a sequence of tokens (generated with a **tokenizer**) into a (large fixed size) text-embedding\n",
@@ -1585,7 +1586,7 @@
     }
    },
    "source": [
-    "Next, we load the *K-LMS* scheduler instead of the *PNDMScheduler* from the default pipeline"
+    "Next, we load an *LMS* scheduler instead of the *PNDMScheduler* from the default pipeline"
    ]
   },
   {
@@ -1742,6 +1743,7 @@
     "**Guidance**\n",
     "\n",
     "For classifier-free guidance, we need $\\tilde z = \\tilde z_x + \\gamma \\big( \\tilde z_{x|y} - \\tilde z_x \\big)$.\n",
+    "\n",
     "We need two forward passes: \n",
     "- one with the conditioned input (``text_embeddings``) to get $\\tilde z_{x|y}$ (i.e., the score function $\\nabla_x p(x|y)$)\n",
     "- one with the unconditional embeddings (``uncond_embeddings``) to get $\\tilde z_x$ (i.e., the score function $\\nabla_x p(x)$)\n",
@@ -1818,9 +1820,9 @@
    "source": [
     "**Scheduler**\n",
     "\n",
-    "- We initialize the *K-LMS* with the ``num_inference_steps`` hyperparameter \n",
+    "- We initialize the *LMS* scheduler with the ``num_inference_steps`` hyperparameter \n",
     "- The scheduler will compute the sigmas $\\sigma_t$ to be used during the denoising process\n",
-    "- *K-LMS* computes the next latent to be fed in the U-net as $\\tilde x_t = \\frac{\\tilde x_t}{\\sqrt{\\sigma_t^2 +1}}$"
+    "- *LMS* computes the next latent to be fed in the U-net as $\\tilde x_t = \\frac{\\tilde x_t}{\\sqrt{\\sigma_t^2 +1}}$"
    ]
   },
   {
@@ -2045,7 +2047,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.8"
+   "version": "3.12.7"
   },
   "toc": {
    "base_numbering": "1",
diff --git a/diffusion_from_scratch.ipynb b/diffusion_from_scratch.ipynb
@@ -79,7 +79,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 1,
    "id": "91673856",
    "metadata": {
     "slideshow": {
@@ -111,7 +111,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 5,
    "id": "fa4ddb28",
    "metadata": {
     "slideshow": {
@@ -628,7 +628,7 @@
     "ax.plot(t1.numpy()[0], label='t=10')\n",
     "ax.plot(t2.numpy()[0], label='t=12')\n",
     "ax.plot(t3.numpy()[0], label='t=30')\n",
-    "plt.legend()"
+    "plt.legend();"
    ]
   },
   {
@@ -828,7 +828,7 @@
     "sqrt_recip_alphas = torch.sqrt(1.0 / alphas)\n",
     "alphas_cumprod_prev = F.pad(alphas_cumprod[:-1], (1, 0), value=1.0)\n",
     "posterior_variance = betas * (1. - alphas_cumprod_prev) / (1. - alphas_cumprod) # β_t\n",
-    "\n",
+    " \n",
     "@torch.no_grad()\n",
     "def p_sample(model, x, t, t_index):\n",
     "     \n",
@@ -841,7 +841,7 @@
     "    \n",
     "    # Use the NN to predict the mean\n",
     "    model_mean = sqrt_recip_alphas_t * (\n",
-    "        x - betas_t * model(x, t) / sqrt_one_minus_alphas_cumprod_t)\n",
+    "        x - betas_t * model(x, t) / sqrt_one_minus_alp has_cumprod_t)\n",
     "\n",
     "    # Draw the next sample\n",
     "    if t_index == 0:\n",
@@ -924,9 +924,9 @@
     }
    },
    "source": [
-    "Next, we define a function that applies some basic image preprocessing on-the-fly: random horizontal flips and rescaling  in the $[-1,1]$ range.\n",
+    "Next, we define some basic image preprocessing on-the-fly: random horizontal flips, converstion to tensor, and rescaling in the $[-1,1]$ range.\n",
     "\n",
-    "We use the ``with_transform`` functionality for that. "
+    "We use ``with_transform`` to apply the transformations to the elements in the dataset."
    ]
   },
   {
@@ -1064,12 +1064,13 @@
     "    for step, batch in enumerate(dataloader):\n",
     "      optimizer.zero_grad()\n",
     "\n",
+    "      # x0\n",
     "      batch_size = batch[\"pixel_values\"].shape[0]\n",
     "      batch = batch[\"pixel_values\"].to(device)\n",
     "\n",
     "      # sample t from U(0,T)\n",
     "      t = torch.randint(0, timesteps, (batch_size,), device=device).long()\n",
-    "\n",
+    "      \n",
     "      loss = p_losses(model, batch, t)\n",
     "\n",
     "      if step % 100 == 0:\n",
@@ -1148,7 +1149,7 @@
     "grid_img = torchvision.utils.make_grid(last_sample, nrow=16)\n",
     "%matplotlib inline\n",
     "plt.figure(figsize = (20,10))\n",
-    "plt.imshow(grid_img.permute(1, 2, 0).cpu().numpy(), cmap='gray')"
+    "plt.imshow(grid_img.permute(1, 2, 0).cpu().numpy(), cmap='gray');"
    ]
   },
   {
@@ -1454,7 +1455,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.8"
+   "version": "3.12.7"
   },
   "toc": {
    "base_numbering": 1,