Skip to content

Commit

Permalink
extended introduction texts in tutorials 6, 7, 8
Browse files Browse the repository at this point in the history
  • Loading branch information
DEGIACOMI committed Aug 30, 2024
1 parent 7947ef5 commit 2c7b69c
Show file tree
Hide file tree
Showing 4 changed files with 19 additions and 8 deletions.
13 changes: 9 additions & 4 deletions 6_Analysis_DR/6_Analysis_DR_part1.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -81,9 +81,7 @@
},
{
"cell_type": "markdown",
"metadata": {
"jp-MarkdownHeadingCollapsed": true
},
"metadata": {},
"source": [
"## 1. Introduction\n",
"<a id='intro'></a>"
Expand All @@ -106,7 +104,14 @@
"- You want a way to visualise your high dimensional data. \n",
"- You want to analyse your data, but it it too high-dimensional.\n",
"\n",
"The algorithms designed to carry out this task are an example of machine learning. In this tutorial we will look at Principal Components Analysis (PCA), time-lagged independent component analysis (tICA), and t-tested Stocastic Neighbour Embedding (t-SNE). In a machine learning context, each dimension in data is called a **feature**, which together form a **feature space**. "
"The algorithms designed to carry out this task are an example of machine learning. In this tutorial we will look at Principal Components Analysis (PCA) and time-lagged independent component analysis (tICA)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In a machine learning context, we call **features** quantities that represent the data. Together, the features form a **feature space**. The choice of meaningful features to characterize a specific phenomenon is as important as the data processing we carry out on them."
]
},
{
Expand Down
5 changes: 3 additions & 2 deletions 6_Analysis_DR/6_Analysis_DR_part2.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
"source": [
"**Learning Objectives**:\n",
"* Calculating and interpreting the PCA of one or multiple molecular dynamics trajectories\n",
"* Calculating TICA of a molecular dynamics trajectory"
"* Calculating tICA of a molecular dynamics trajectory"
]
},
{
Expand Down Expand Up @@ -111,7 +111,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"To get started with this tutorial, let's importing some packages."
"In this tutorial we will learn how to featurize molecular dynamics simulations, and carry out dimensionality reduction on these features.\n",
"To get started, let's importing some packages."
]
},
{
Expand Down
5 changes: 3 additions & 2 deletions 7_Analysis_clustering/7_Analysis_clustering.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -119,11 +119,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Clustering is the grouping of data points which are similar to each other. It can be a powerful technique for identifying patterns in data. Clustering is known as an *unsupervised learning* technique, whereby no example data needs to be provided for this task to be carried out. Application of clustering are:\n",
"- Looking for trends in data\n",
"- Data compression, all data clustering around a point can be reduced to just that point (e.g., reducing colour depth of an image)\n",
"- Pattern recognition."
"- Pattern recognition.\n",
"\n",
"In this tutorial, we will look at common clustering techniques, k-Means, DBASCAN, and Spectral Clustering, and we will apply them first to toy data, and then on the molecular dynamics simulation on alanine dipeptide."
]
},
{
Expand Down
4 changes: 4 additions & 0 deletions 8_Analysis_classification/8_Analysis_classification.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,10 @@
"id": "196868f4",
"metadata": {},
"source": [
"Classification is the subdivision of features into categories. In its most typical embodiment classification is an example of *supervised learning*: a machine learning algorithm is shown a set of datapoints classified by an expert, and is then tasked to categorise a larger dataset of unclassified data.\n",
"\n",
"The typical classification protocol involves subdividing pre-classified data in two parts: a *training set* which is used to train the machine learning algorithm, and a *test set* (at times called *validation set*) which is used to verify whether the model can correctly classify datapoints it was not exposed to during training. The test set is important to identify cases of *overfitting*, a phenomenon whereby the model over-specialises in classifying the training data, sacrificing its ability to generalise.\n",
"\n",
"To get started with this tutorial, let's importing some packages."
]
},
Expand Down

0 comments on commit 2c7b69c

Please sign in to comment.