From 2c7b69c5899fb1532673abe0e2b587dfe7238bab Mon Sep 17 00:00:00 2001 From: DEGIACOMI Date: Fri, 30 Aug 2024 08:38:53 +0100 Subject: [PATCH] extended introduction texts in tutorials 6, 7, 8 --- 6_Analysis_DR/6_Analysis_DR_part1.ipynb | 13 +++++++++---- 6_Analysis_DR/6_Analysis_DR_part2.ipynb | 5 +++-- 7_Analysis_clustering/7_Analysis_clustering.ipynb | 5 +++-- .../8_Analysis_classification.ipynb | 4 ++++ 4 files changed, 19 insertions(+), 8 deletions(-) diff --git a/6_Analysis_DR/6_Analysis_DR_part1.ipynb b/6_Analysis_DR/6_Analysis_DR_part1.ipynb index 88c61ad..5edc660 100644 --- a/6_Analysis_DR/6_Analysis_DR_part1.ipynb +++ b/6_Analysis_DR/6_Analysis_DR_part1.ipynb @@ -81,9 +81,7 @@ }, { "cell_type": "markdown", - "metadata": { - "jp-MarkdownHeadingCollapsed": true - }, + "metadata": {}, "source": [ "## 1. Introduction\n", "" @@ -106,7 +104,14 @@ "- You want a way to visualise your high dimensional data. \n", "- You want to analyse your data, but it it too high-dimensional.\n", "\n", - "The algorithms designed to carry out this task are an example of machine learning. In this tutorial we will look at Principal Components Analysis (PCA), time-lagged independent component analysis (tICA), and t-tested Stocastic Neighbour Embedding (t-SNE). In a machine learning context, each dimension in data is called a **feature**, which together form a **feature space**. " + "The algorithms designed to carry out this task are an example of machine learning. In this tutorial we will look at Principal Components Analysis (PCA) and time-lagged independent component analysis (tICA)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In a machine learning context, we call **features** quantities that represent the data. Together, the features form a **feature space**. The choice of meaningful features to characterize a specific phenomenon is as important as the data processing we carry out on them." ] }, { diff --git a/6_Analysis_DR/6_Analysis_DR_part2.ipynb b/6_Analysis_DR/6_Analysis_DR_part2.ipynb index f86e603..c0cfce1 100644 --- a/6_Analysis_DR/6_Analysis_DR_part2.ipynb +++ b/6_Analysis_DR/6_Analysis_DR_part2.ipynb @@ -23,7 +23,7 @@ "source": [ "**Learning Objectives**:\n", "* Calculating and interpreting the PCA of one or multiple molecular dynamics trajectories\n", - "* Calculating TICA of a molecular dynamics trajectory" + "* Calculating tICA of a molecular dynamics trajectory" ] }, { @@ -111,7 +111,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "To get started with this tutorial, let's importing some packages." + "In this tutorial we will learn how to featurize molecular dynamics simulations, and carry out dimensionality reduction on these features.\n", + "To get started, let's importing some packages." ] }, { diff --git a/7_Analysis_clustering/7_Analysis_clustering.ipynb b/7_Analysis_clustering/7_Analysis_clustering.ipynb index 15b62e6..0e513ef 100644 --- a/7_Analysis_clustering/7_Analysis_clustering.ipynb +++ b/7_Analysis_clustering/7_Analysis_clustering.ipynb @@ -119,11 +119,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\n", "Clustering is the grouping of data points which are similar to each other. It can be a powerful technique for identifying patterns in data. Clustering is known as an *unsupervised learning* technique, whereby no example data needs to be provided for this task to be carried out. Application of clustering are:\n", "- Looking for trends in data\n", "- Data compression, all data clustering around a point can be reduced to just that point (e.g., reducing colour depth of an image)\n", - "- Pattern recognition." + "- Pattern recognition.\n", + "\n", + "In this tutorial, we will look at common clustering techniques, k-Means, DBASCAN, and Spectral Clustering, and we will apply them first to toy data, and then on the molecular dynamics simulation on alanine dipeptide." ] }, { diff --git a/8_Analysis_classification/8_Analysis_classification.ipynb b/8_Analysis_classification/8_Analysis_classification.ipynb index 88445db..f3ba193 100644 --- a/8_Analysis_classification/8_Analysis_classification.ipynb +++ b/8_Analysis_classification/8_Analysis_classification.ipynb @@ -122,6 +122,10 @@ "id": "196868f4", "metadata": {}, "source": [ + "Classification is the subdivision of features into categories. In its most typical embodiment classification is an example of *supervised learning*: a machine learning algorithm is shown a set of datapoints classified by an expert, and is then tasked to categorise a larger dataset of unclassified data.\n", + "\n", + "The typical classification protocol involves subdividing pre-classified data in two parts: a *training set* which is used to train the machine learning algorithm, and a *test set* (at times called *validation set*) which is used to verify whether the model can correctly classify datapoints it was not exposed to during training. The test set is important to identify cases of *overfitting*, a phenomenon whereby the model over-specialises in classifying the training data, sacrificing its ability to generalise.\n", + "\n", "To get started with this tutorial, let's importing some packages." ] },