extended introduction texts in tutorials 6, 7, 8

CCPBioSim · Aug 30, 2024 · 2c7b69c · 2c7b69c
1 parent 7947ef5
commit 2c7b69c
Show file tree

Hide file tree

Showing 4 changed files with 19 additions and 8 deletions.
diff --git a/6_Analysis_DR/6_Analysis_DR_part1.ipynb b/6_Analysis_DR/6_Analysis_DR_part1.ipynb
@@ -81,9 +81,7 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {
-    "jp-MarkdownHeadingCollapsed": true
-   },
+   "metadata": {},
    "source": [
     "## 1. Introduction\n",
     "<a id='intro'></a>"
@@ -106,7 +104,14 @@
     "- You want a way to visualise your high dimensional data. \n",
     "- You want to analyse your data, but it it too high-dimensional.\n",
     "\n",
-    "The algorithms designed to carry out this task are an example of machine learning. In this tutorial we will look at Principal Components Analysis (PCA), time-lagged independent component analysis (tICA), and t-tested Stocastic Neighbour Embedding (t-SNE). In a machine learning context, each dimension in data is called a **feature**, which together form a **feature space**. "
+    "The algorithms designed to carry out this task are an example of machine learning. In this tutorial we will look at Principal Components Analysis (PCA) and time-lagged independent component analysis (tICA)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In a machine learning context, we call **features** quantities that represent the data. Together, the features form a **feature space**. The choice of meaningful features to characterize a specific phenomenon is as important as the data processing we carry out on them."
    ]
   },
   {

diff --git a/6_Analysis_DR/6_Analysis_DR_part2.ipynb b/6_Analysis_DR/6_Analysis_DR_part2.ipynb
@@ -23,7 +23,7 @@
    "source": [
     "**Learning Objectives**:\n",
     "* Calculating and interpreting the PCA of one or multiple molecular dynamics trajectories\n",
-    "* Calculating TICA of a molecular dynamics trajectory"
+    "* Calculating tICA of a molecular dynamics trajectory"
    ]
   },
   {
@@ -111,7 +111,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "To get started with this tutorial, let's importing some packages."
+    "In this tutorial we will learn how to featurize molecular dynamics simulations, and carry out dimensionality reduction on these features.\n",
+    "To get started, let's importing some packages."
    ]
   },
   {

diff --git a/7_Analysis_clustering/7_Analysis_clustering.ipynb b/7_Analysis_clustering/7_Analysis_clustering.ipynb
@@ -119,11 +119,12 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "\n",
     "Clustering is the grouping of data points which are similar to each other. It can be a powerful technique for identifying patterns in data. Clustering is known as an *unsupervised learning* technique, whereby no example data needs to be provided for this task to be carried out. Application of clustering are:\n",
     "- Looking for trends in data\n",
     "- Data compression, all data clustering around a point can be reduced to just that point (e.g., reducing colour depth of an image)\n",
-    "- Pattern recognition."
+    "- Pattern recognition.\n",
+    "\n",
+    "In this tutorial, we will look at common clustering techniques, k-Means, DBASCAN, and Spectral Clustering, and we will apply them first to toy data, and then on the molecular dynamics simulation on alanine dipeptide."
    ]
   },
   {

diff --git a/8_Analysis_classification/8_Analysis_classification.ipynb b/8_Analysis_classification/8_Analysis_classification.ipynb
@@ -122,6 +122,10 @@
    "id": "196868f4",
    "metadata": {},
    "source": [
+    "Classification is the subdivision of features into categories. In its most typical embodiment classification is an example of *supervised learning*: a machine learning algorithm is shown a set of datapoints classified by an expert, and is then tasked to categorise a larger dataset of unclassified data.\n",
+    "\n",
+    "The typical classification protocol involves subdividing pre-classified data in two parts: a *training set* which is used to train the machine learning algorithm, and a *test set* (at times called *validation set*) which is used to verify whether the model can correctly classify datapoints it was not exposed to during training. The test set is important to identify cases of *overfitting*, a phenomenon whereby the model over-specialises in classifying the training data, sacrificing its ability to generalise.\n",
+    "\n",
     "To get started with this tutorial, let's importing some packages."
    ]
   },