clarified the section on likelihood evaluation + added one question

vlievin · vlievin · commit 60229efac580 · 2020-10-19T11:47:46.000+02:00
diff --git a/7_Unsupervised/7.2-EXE-variational-autoencoder.ipynb b/7_Unsupervised/7.2-EXE-variational-autoencoder.ipynb
@@ -237,7 +237,11 @@
     "\n",
     "### Estimating the Likelihood\n",
     "\n",
-    "A VAE defines a probabilistic model $p_\\theta(\\mathbf{x} | \\mathbf{z}) p(\\mathbf{z})$ and we are interested in maximizing the ability of the model to explain the dataset $\\mathcal{D} = \\{\\mathbf{x}_i\\}_{i=1, \\dots, N}$, hence we aim at obtaining the maximum probability $\\log p_\\theta(\\mathcal{D}) = \\sum_{i=1}^N \\log p_\\theta(\\mathbf{x}_i) =  \\sum_{i=1}^N \\log \\int_\\mathbf{z} p_\\theta(\\mathbf{x}_i, \\mathbf{z}) d\\mathbf{z} $. Nonetheless the Evidence Lower Bound (ELBO) only provides a lower bound to this quantity. It is common practice to report the average marginal log likelihood: $\\log p_\\theta(\\mathcal{D}) / N$.\n",
+    "A VAE defines a probabilistic model $p_\\theta(\\mathbf{x} | \\mathbf{z}) p(\\mathbf{z})$ and we are interested in maximizing the ability of the model to explain the dataset $\\mathcal{D} = \\{\\mathbf{x}_i\\}_{i=1, \\dots, N}$, hence we aim at obtaining the maximum probability $\\log p_\\theta(\\mathcal{D}) = \\sum_{i=1}^N \\log p_\\theta(\\mathbf{x}_i) =  \\sum_{i=1}^N \\log \\int_\\mathbf{z} p_\\theta(\\mathbf{x}_i, \\mathbf{z}) d\\mathbf{z} $. However, as discussed previously, the log-likelihood is intractable (marginalization over $\\mathbf{z}$), hence we rely on the Evidence Lower Bound (ELBO) as a proxy, or a tighter bound such as the importance weighted bound (see at the end of the notebook). \n",
+    "\n",
+    "**NB** It is common practice to report the average marginal log likelihood $\\log p_\\theta(\\mathcal{D}) / N$ and not $\\log p_\\theta(\\mathcal{D})$ directly:\n",
+    "\n",
+    "$$\\frac{1}{N} \\log p_\\theta(\\mathcal{D}) = \\frac{1}{N} \\sum_i \\log p_\\theta(\\mathbf{x}_i) \\geq \\frac{1}{N} \\sum_i \\operatorname{ELBO}(\\mathbf{x}_i) \\ . $$\n",
     "\n",
     "### Evaluation on Downstream Tasks\n",
     "\n",
@@ -825,17 +829,18 @@
     "\n",
     "### Exercise 1.\n",
     "\n",
-    "- Implement the class `ReparameterizedDiagonalGaussian` (`log_prob()` and `rsample()`).\n",
-    "- Import the class `Bernoulli`\n",
-    "- Implement the class `VariationalInference` (computation of the `elbo` and `beta_elbo`).\n",
+    "1. Implement the class `ReparameterizedDiagonalGaussian` (`log_prob()` and `rsample()`).\n",
+    "2. Import the class `Bernoulli`\n",
+    "3. Implement the class `VariationalInference` (computation of the `elbo` and `beta_elbo`).\n",
     "\n",
     "### Exercise 2.\n",
     "\n",
-    "**Evaluating a VAE model**\n",
+    "**Trainnig and Evaluating a VAE model**\n",
     "\n",
-    "- What available metric can you use to estimate the marginal likelihood ($\\log p_\\theta(\\mathbf{x})$) ?\n",
-    "- In the above plots, we display numerous model samples. If you had to pick one plot, which one would you pick to evaluate the quality of a VAE (i.e. using posterior samples $\\mathbf{z} \\sim q_\\phi(\\mathbf{z} | \\mathbf{x})$ or prior samples $\\mathbf{z} \\sim p(\\mathbf{z})$) ? Why?.\n",
-    "- How could you exploit the VAE model for classification?\n",
+    "1. Why do we use the reparameterization trick?\n",
+    "2. What available metric can you use to estimate the marginal likelihood ($p_\\theta(\\mathbf{x})$) ?\n",
+    "3. In the above plots, we display numerous model samples. If you had to pick one plot, which one would you pick to evaluate the quality of a VAE (i.e. using posterior samples $\\mathbf{z} \\sim q_\\phi(\\mathbf{z} | \\mathbf{x})$ or prior samples $\\mathbf{z} \\sim p(\\mathbf{z})$) ? Why?.\n",
+    "4. How could you exploit the VAE model for classification?\n",
     "\n",
     "**Answers**:\n",
     "\n",
@@ -845,9 +850,9 @@
     "\n",
     "**Experiment with the VAE model.**\n",
     "\n",
-    "- Experiment with the number of layers and activation functions in order to improve the reconstructions and latent representation. What solution did you find the best and why?\n",
-    "- Try to increase the number of digit classes in the training set and analyze the learning curves, latent space and reconstructions. For which classes and why does the VAE fail in reconstructing?  *HINT: Try the combination: `classes=[0, 1, 4, 9]`, to see how well VAE can separate these digits in the latent representation and reconstructions.*\n",
-    "- Increase the number of units in the latent layer. Does it increase the models representational power and how can you see and explain this? How does this affect the quality of the reconstructions?\n",
+    "1. Experiment with the number of layers and activation functions in order to improve the reconstructions and latent representation. What solution did you find the best and why?\n",
+    "2. Try to increase the number of digit classes in the training set and analyze the learning curves, latent space and reconstructions. For which classes and why does the VAE fail in reconstructing?  *HINT: Try the combination: `classes=[0, 1, 4, 9]`, to see how well VAE can separate these digits in the latent representation and reconstructions.*\n",
+    "3. Increase the number of units in the latent layer. Does it increase the models representational power and how can you see and explain this? How does this affect the quality of the reconstructions?\n",
     "\n",
     "**Answers**:\n",
     "\n",