|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "# Multi-output linear classification\n", |
| 8 | + "\n", |
| 9 | + "In the previous notebook, we have seen a simple neural network regressor (Linear regression) and classifier (Logistic regression) that were implemented as a single artificial neuron.\n", |
| 10 | + "\n", |
| 11 | + "In this notebook we will implement a __multi-output single layer perceptron__ to obtain a multi-label linear classification model" |
| 12 | + ] |
| 13 | + }, |
| 14 | + { |
| 15 | + "cell_type": "markdown", |
| 16 | + "metadata": {}, |
| 17 | + "source": [ |
| 18 | + "## Multi-label linear classification\n", |
| 19 | + "<img src=\"images/softmax_cropped.png\" width = \"300\" style=\"float: right;\">\n", |
| 20 | + "\n", |
| 21 | + "A single layer perceptron consists of multiple neurons organised in one layer. These neurons share the same input features, but each of them produces a different output. The outputs of the linear layer for each neuron $z_k=\\sum_jw_{jk}x_j$ are either passed through the same activation function $f(z)$, or in case of multi-label classification through a shared **softmax** activation function:\n", |
| 22 | + "$$ \\hat{p}_k=\\frac{e^z_k}{\\sum_{j=1}^Ke^z_{j}}$$\n", |
| 23 | + "\n", |
| 24 | + "The linear multi-label classification is implemented with a **single linear layer**, with\n", |
| 25 | + "* the number of **inputs** equal to the number of **features**\n", |
| 26 | + "* the number of **outputs** equal to the number of **classes**\n", |
| 27 | + "\n", |
| 28 | + "For example, if we would like to predict no, mild moderate and severe heart failure from EF and GLS, we need two inputs and three outputs, as implemented in the cell below." |
| 29 | + ] |
| 30 | + }, |
| 31 | + { |
| 32 | + "cell_type": "code", |
| 33 | + "execution_count": null, |
| 34 | + "metadata": {}, |
| 35 | + "outputs": [], |
| 36 | + "source": [ |
| 37 | + "import torch\n", |
| 38 | + "from torch import nn\n", |
| 39 | + "l = nn.Linear(2,3)\n", |
| 40 | + "print(l)" |
| 41 | + ] |
| 42 | + }, |
| 43 | + { |
| 44 | + "cell_type": "markdown", |
| 45 | + "metadata": {}, |
| 46 | + "source": [ |
| 47 | + "The loss will be set to **cross-entropy** using the in-built function `CrossEntropyLoss`. This function combines softmax with cross-entropy loss, so we will not need to implement the activation function in our network. For numerical reasons, Pytorch implements log-softmax followed by negative log-likelihood loss in this function." |
| 48 | + ] |
| 49 | + }, |
| 50 | + { |
| 51 | + "cell_type": "code", |
| 52 | + "execution_count": null, |
| 53 | + "metadata": {}, |
| 54 | + "outputs": [], |
| 55 | + "source": [ |
| 56 | + "loss_function = nn.CrossEntropyLoss()" |
| 57 | + ] |
| 58 | + }, |
| 59 | + { |
| 60 | + "cell_type": "markdown", |
| 61 | + "metadata": {}, |
| 62 | + "source": [ |
| 63 | + "## Exercise 2: Multi-label linear classifier\n", |
| 64 | + "\n", |
| 65 | + "In this exercise we will implement a multi-label classifier in Pytorch, to predict no, mild moderate and severe heart failure from EF and GLS. Code below loads and plots the data, and converts the data into Pytorch tensors.\n", |
| 66 | + "\n", |
| 67 | + "Note, that the input features are required to be of type `float`, while output labels need to be of type `long`." |
| 68 | + ] |
| 69 | + }, |
| 70 | + { |
| 71 | + "cell_type": "code", |
| 72 | + "execution_count": null, |
| 73 | + "metadata": {}, |
| 74 | + "outputs": [], |
| 75 | + "source": [ |
| 76 | + "# only do this if you work on Google Colab\n", |
| 77 | + "# run the cell\n", |
| 78 | + "# then upload file 'heart_failure_data_complete.csv'\n", |
| 79 | + "\n", |
| 80 | + "from google.colab import files\n", |
| 81 | + "files.upload()" |
| 82 | + ] |
| 83 | + }, |
| 84 | + { |
| 85 | + "cell_type": "code", |
| 86 | + "execution_count": null, |
| 87 | + "metadata": {}, |
| 88 | + "outputs": [], |
| 89 | + "source": [ |
| 90 | + "import numpy as np\n", |
| 91 | + "import matplotlib.pyplot as plt\n", |
| 92 | + "import pandas as pd\n", |
| 93 | + "from sklearn.preprocessing import StandardScaler\n", |
| 94 | + "\n", |
| 95 | + "df = pd.read_csv('heart_failure_data_complete.csv')\n", |
| 96 | + "data = df.to_numpy()\n", |
| 97 | + "X = data[:,[1,2]]\n", |
| 98 | + "X = StandardScaler().fit_transform(X)\n", |
| 99 | + "y = data[:,0]\n", |
| 100 | + "\n", |
| 101 | + "def PlotData(X,y,fontsize=12):\n", |
| 102 | + " plt.plot(X[y==0,0],X[y==0,1],'bo',alpha=0.75,markeredgecolor='k',label = 'Healthy')\n", |
| 103 | + " plt.plot(X[y==1,0],X[y==1,1],'rd',alpha=0.75,markeredgecolor='k',label = 'moderate HF')\n", |
| 104 | + " plt.plot(X[y==2,0],X[y==2,1],'g^',alpha=0.75,markeredgecolor='k',label = 'severe HF')\n", |
| 105 | + " plt.title('Diagnosis of Heart Failure', fontsize = fontsize+2)\n", |
| 106 | + " plt.xlabel('EF', fontsize = fontsize)\n", |
| 107 | + " plt.ylabel('GLS', fontsize = fontsize)\n", |
| 108 | + " plt.legend(fontsize = fontsize-2)\n", |
| 109 | + "\n", |
| 110 | + "PlotData(X,y)\n", |
| 111 | + "\n", |
| 112 | + "# convert numpy array to tensor in shape of input size\n", |
| 113 | + "X = torch.from_numpy(X).float()\n", |
| 114 | + "y = torch.from_numpy(y).long()\n", |
| 115 | + "print('X: ', X.shape)\n", |
| 116 | + "print('y: ', y.shape)" |
| 117 | + ] |
| 118 | + }, |
| 119 | + { |
| 120 | + "cell_type": "markdown", |
| 121 | + "metadata": {}, |
| 122 | + "source": [ |
| 123 | + "Below is the function to plot the classification result. Run the code." |
| 124 | + ] |
| 125 | + }, |
| 126 | + { |
| 127 | + "cell_type": "code", |
| 128 | + "execution_count": null, |
| 129 | + "metadata": {}, |
| 130 | + "outputs": [], |
| 131 | + "source": [ |
| 132 | + "def PlotClassification(net,X,y,fontsize=12):\n", |
| 133 | + "\n", |
| 134 | + " # Create an 1D array of samples for each feature\n", |
| 135 | + " x1 = np.linspace(-2.5, 2, 1000) \n", |
| 136 | + " x2 = np.linspace(-3, 3.5, 1000).T # note the transpose\n", |
| 137 | + " # Creates 2D arrays that hold the coordinates in 2D feature space\n", |
| 138 | + " x1, x2 = np.meshgrid(x1, x2) \n", |
| 139 | + " # Flatten x1 and x2 to 1D vector and concatenate into a feature matrix\n", |
| 140 | + " Feature_space = np.c_[x1.ravel(), x2.ravel()] \n", |
| 141 | + " \n", |
| 142 | + " # NEW: convert numpy to torch\n", |
| 143 | + " Feature_space = torch.from_numpy(Feature_space).float()\n", |
| 144 | + " # NEW: Predict output scores for the whole feature space \n", |
| 145 | + " output_scores = net(Feature_space)\n", |
| 146 | + " # NEW: Take maximum to get the labels\n", |
| 147 | + " _,y_pred=torch.max(output_scores, 1)\n", |
| 148 | + " # NEW: Convert to numpy\n", |
| 149 | + " y_pred = y_pred.numpy()\n", |
| 150 | + " \n", |
| 151 | + " # Resahpe to 2D\n", |
| 152 | + " y_pred = y_pred.reshape(x1.shape)\n", |
| 153 | + " # Plot using contourf\n", |
| 154 | + " plt.contourf(x1, x2, y_pred, cmap = 'summer')\n", |
| 155 | + " \n", |
| 156 | + " # Plot data\n", |
| 157 | + " PlotData(X,y,fontsize)" |
| 158 | + ] |
| 159 | + }, |
| 160 | + { |
| 161 | + "cell_type": "markdown", |
| 162 | + "metadata": {}, |
| 163 | + "source": [ |
| 164 | + "### Train test split\n", |
| 165 | + "**Task 2.1:** First, split the data into training set and test set. For this, we will use scikit-learn `train_test_split`. Note that this function works on Pytorch tensors the same way as on numpy arrays. Use 33% of the data for testing. Note the types of the split dataset." |
| 166 | + ] |
| 167 | + }, |
| 168 | + { |
| 169 | + "cell_type": "code", |
| 170 | + "execution_count": null, |
| 171 | + "metadata": {}, |
| 172 | + "outputs": [], |
| 173 | + "source": [ |
| 174 | + "from sklearn.model_selection import train_test_split\n", |
| 175 | + "X_train, X_test, y_train, y_test = None\n", |
| 176 | + "\n", |
| 177 | + "print('Test features type:', X_test.type())\n", |
| 178 | + "print('Test labels type:', y_test.type())\n", |
| 179 | + "print('Test labels:', y_test)" |
| 180 | + ] |
| 181 | + }, |
| 182 | + { |
| 183 | + "cell_type": "markdown", |
| 184 | + "metadata": {}, |
| 185 | + "source": [ |
| 186 | + "### Create and train the network\n", |
| 187 | + "**Task 2.2:** Fill in the code below to create and train multi-label classification model in Pytorch. Make sure that the network is trained using only training data" |
| 188 | + ] |
| 189 | + }, |
| 190 | + { |
| 191 | + "cell_type": "code", |
| 192 | + "execution_count": null, |
| 193 | + "metadata": {}, |
| 194 | + "outputs": [], |
| 195 | + "source": [ |
| 196 | + "# network architecture\n", |
| 197 | + "class MultiLabelClassifier(nn.Module):\n", |
| 198 | + " def __init__(self):\n", |
| 199 | + " super(MultiLabelClassifier, self).__init__()\n", |
| 200 | + " self.layer = None\n", |
| 201 | + "\n", |
| 202 | + " def forward(self, x):\n", |
| 203 | + " x = None \n", |
| 204 | + " return x\n", |
| 205 | + "\n", |
| 206 | + "# create model \n", |
| 207 | + "net = MultiLabelClassifier()\n", |
| 208 | + "\n", |
| 209 | + "# loss\n", |
| 210 | + "loss_function = None\n", |
| 211 | + "\n", |
| 212 | + "# optimiser\n", |
| 213 | + "optimizer = torch.optim.SGD(net.parameters(), lr=0.2)\n", |
| 214 | + "\n", |
| 215 | + "# train for 500 epochs\n", |
| 216 | + "epochs = 500\n", |
| 217 | + "for i in range(epochs):\n", |
| 218 | + " optimizer.zero_grad() \n", |
| 219 | + " prediction = None \n", |
| 220 | + " loss = loss_function(prediction, None) \n", |
| 221 | + " loss.backward() \n", |
| 222 | + " optimizer.step() \n", |
| 223 | + "\n", |
| 224 | + "# Plot result\n", |
| 225 | + "PlotClassification(net,None,None)" |
| 226 | + ] |
| 227 | + }, |
| 228 | + { |
| 229 | + "cell_type": "markdown", |
| 230 | + "metadata": {}, |
| 231 | + "source": [ |
| 232 | + "### Evaluate training accuracy\n", |
| 233 | + "\n", |
| 234 | + "We will now show how we can predict the labels on the test set using this network. Because softmax and cross-entropy loss are combined, for each sample the network will return three outputs of the linear layer $z_0,z_1,z_2$ that correspond to the three classes. These outputs are referred to as **logits**. Let's test that for an individual feature vector $x=(0,0)$ that we create:" |
| 235 | + ] |
| 236 | + }, |
| 237 | + { |
| 238 | + "cell_type": "code", |
| 239 | + "execution_count": null, |
| 240 | + "metadata": {}, |
| 241 | + "outputs": [], |
| 242 | + "source": [ |
| 243 | + "# create a feature vector of correct shape and type\n", |
| 244 | + "x = torch.tensor((0,0)).reshape(1,2).float()\n", |
| 245 | + "# predict using forward pass\n", |
| 246 | + "z = net(x)\n", |
| 247 | + "# print logits\n", |
| 248 | + "print('Logits: ', z)" |
| 249 | + ] |
| 250 | + }, |
| 251 | + { |
| 252 | + "cell_type": "markdown", |
| 253 | + "metadata": {}, |
| 254 | + "source": [ |
| 255 | + "To find the label for this datapoint, we need to find which class returned the largest logit:" |
| 256 | + ] |
| 257 | + }, |
| 258 | + { |
| 259 | + "cell_type": "code", |
| 260 | + "execution_count": null, |
| 261 | + "metadata": {}, |
| 262 | + "outputs": [], |
| 263 | + "source": [ |
| 264 | + "y = torch.argmax(z, dim=1)\n", |
| 265 | + "print('Predicted label: ', y)" |
| 266 | + ] |
| 267 | + }, |
| 268 | + { |
| 269 | + "cell_type": "markdown", |
| 270 | + "metadata": {}, |
| 271 | + "source": [ |
| 272 | + "**Task 2.3:** Fill in the code to predict the labels for the training set" |
| 273 | + ] |
| 274 | + }, |
| 275 | + { |
| 276 | + "cell_type": "code", |
| 277 | + "execution_count": null, |
| 278 | + "metadata": {}, |
| 279 | + "outputs": [], |
| 280 | + "source": [ |
| 281 | + "# forward pass\n", |
| 282 | + "pred=None\n", |
| 283 | + "# find maximum\n", |
| 284 | + "y_pred_train = None\n", |
| 285 | + "print(y_pred_train)" |
| 286 | + ] |
| 287 | + }, |
| 288 | + { |
| 289 | + "cell_type": "markdown", |
| 290 | + "metadata": {}, |
| 291 | + "source": [ |
| 292 | + "Pytorch does not offer functions for calculating performance measures, by the can use `accuracy_score` from scikit-learn." |
| 293 | + ] |
| 294 | + }, |
| 295 | + { |
| 296 | + "cell_type": "code", |
| 297 | + "execution_count": null, |
| 298 | + "metadata": {}, |
| 299 | + "outputs": [], |
| 300 | + "source": [ |
| 301 | + "from sklearn.metrics import accuracy_score\n", |
| 302 | + "print('Training accuracy: ', accuracy_score(y_train, y_pred_train))" |
| 303 | + ] |
| 304 | + }, |
| 305 | + { |
| 306 | + "cell_type": "markdown", |
| 307 | + "metadata": {}, |
| 308 | + "source": [ |
| 309 | + "### Evaluate on test set\n", |
| 310 | + "\n", |
| 311 | + "**Task 2.3:** To evaluate accuracy on the test set, implement the following:\n", |
| 312 | + "* predict the logits for the test set by running a forward pass through the network\n", |
| 313 | + "* convert logits to label\n", |
| 314 | + "* calculate test accuracy score\n", |
| 315 | + "* plot the classification result for the test set" |
| 316 | + ] |
| 317 | + }, |
| 318 | + { |
| 319 | + "cell_type": "code", |
| 320 | + "execution_count": null, |
| 321 | + "metadata": {}, |
| 322 | + "outputs": [], |
| 323 | + "source": [ |
| 324 | + "# forward pass\n", |
| 325 | + "pred=None\n", |
| 326 | + "\n", |
| 327 | + "# find maximum\n", |
| 328 | + "y_pred_test = None\n", |
| 329 | + "\n", |
| 330 | + "# calculate accuracy\n", |
| 331 | + "print('Test accuracy: ', None)\n", |
| 332 | + "\n", |
| 333 | + "# plot\n" |
| 334 | + ] |
| 335 | + } |
| 336 | + ], |
| 337 | + "metadata": { |
| 338 | + "kernelspec": { |
| 339 | + "display_name": "Python 3", |
| 340 | + "language": "python", |
| 341 | + "name": "python3" |
| 342 | + }, |
| 343 | + "language_info": { |
| 344 | + "codemirror_mode": { |
| 345 | + "name": "ipython", |
| 346 | + "version": 3 |
| 347 | + }, |
| 348 | + "file_extension": ".py", |
| 349 | + "mimetype": "text/x-python", |
| 350 | + "name": "python", |
| 351 | + "nbconvert_exporter": "python", |
| 352 | + "pygments_lexer": "ipython3", |
| 353 | + "version": "3.8.3" |
| 354 | + } |
| 355 | + }, |
| 356 | + "nbformat": 4, |
| 357 | + "nbformat_minor": 4 |
| 358 | +} |
0 commit comments