Chapter 8

mariadeprez · mariadeprez · commit 75672dffd0f3 · 2023-06-22T13:36:55.000+01:00
diff --git a/Chapter 8 - Features/Notebooks/.ipynb_checkpoints/8.1 Feature Extraction-checkpoint.ipynb b/Chapter 8 - Features/Notebooks/.ipynb_checkpoints/8.1 Feature Extraction-checkpoint.ipynb
@@ -0,0 +1,352 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Feature extraction\n",
+    "## Detecting cancer from histopatological images\n",
+    "In this tutorial we will apply feature extractors to detect cancer in histopatological images of breast tissue. We will use selected images from the PatchCamelyon dataset https://github.com/basveeling/pcam.\n",
+    "<img src=\"pictures/pcam.jpg\" style=\"max-width:100%; width: 100%; max-width: none\">\n",
+    "\n",
+    "### Load the dataset\n",
+    "\n",
+    "Run the code below to load the dataset from the file `histological_data.npz`.\n",
+    "\n",
+    "*Note: Download the dataset from* https://gin.g-node.org/MachineLearningBiomedApplications/data *and place it in the folder `datasets`*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "# Load dataset from .npz file\n",
+    "data = np.load('datasets/histological_data.npz')\n",
+    "\n",
+    "# Train images and labels\n",
+    "X_train = data['X_train']\n",
+    "y_train = data['y_train'].astype('int')\n",
+    "\n",
+    "# Test images and labels\n",
+    "X_test  = data['X_test']\n",
+    "y_test  = data['y_test'].astype('int')\n",
+    "\n",
+    "# Print shapes here\n",
+    "print('Training data - images:', X_train.shape)\n",
+    "print('Training data - labels:',y_train.shape)\n",
+    "print('Test data - images:',X_test.shape)\n",
+    "print('Test data - labels:',y_test.shape)\n",
+    "print('Labels: ', np.unique(y_test))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Activity 1:** Answer the following questions:\n",
+    "* How many training samples we have?\n",
+    "* How many test samples we have?\n",
+    "* What is the dimension of each sample image?\n",
+    "* How many labels we have?\n",
+    "\n",
+    "**Answer:** "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's now plot a few example histopathological images. Note that label 1 means presence of cancerous cells."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "id_images = [4, 5, 6, 7]\n",
+    "\n",
+    "plt.figure(figsize=(15, 8))\n",
+    "for i in np.arange(0, 4):\n",
+    "    plt.subplot(1, 4, i+1)\n",
+    "    plt.imshow(X_train[id_images[i], :, :], cmap='gray')\n",
+    "    plt.title('label: ' + str(y_train[id_images[i]]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Cancer detection using texture descriptors\n",
+    "\n",
+    "We will now calculate the texture descriptors using **Grey-level co-ocurrence matrix (GLCM)**. The matrix can be calculated using `skimage` object `greycomatrix`.\n",
+    "\n",
+    "We will select one healthy and one cancerous sample image. The GLCM for the healthy sample has been generated and plotted for you. \n",
+    "\n",
+    "**Activity 2:** Do the same for the cancerous sample. Do the matrices look different? Can you think why?\n",
+    "\n",
+    "**Answer:** "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# example images\n",
+    "healthy = X_train[7, :, :] \n",
+    "cancer = X_train[5, :, :] \n",
+    "\n",
+    "# calculate and plot GLCM\n",
+    "from skimage.feature import greycomatrix\n",
+    "\n",
+    "plt.figure(figsize=(10,4))\n",
+    "\n",
+    "plt.subplot(121)\n",
+    "glcm_healthy = greycomatrix(np.round(healthy*63).astype('uint8'), [3], [0],64)\n",
+    "plt.imshow(glcm_healthy.reshape(64,64), cmap='gray')\n",
+    "plt.title('GLCM healthy')\n",
+    "\n",
+    "plt.subplot(122)\n",
+    "glcm_cancer = None\n",
+    "\n",
+    "_=plt.title('GLCM cancer')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we can calculate some statistical properties from the GLCM matrix. We can do that using `skimage` object `greycoprops`. Print out different statistical measures for the healthy and cancerous tissue:\n",
+    "* `'contrast'`\n",
+    "* `'dissimilarity'`\n",
+    "* `'homogeneity'`\n",
+    "* `'energy'`\n",
+    "* `'correlation'`\n",
+    "\n",
+    "**Activity 3:** Complete the code below to generate all five measures for both healthy and cancerous samples."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from skimage.feature import greycoprops\n",
+    "properties = ['contrast', 'dissimilarity']\n",
+    "\n",
+    "for p in properties:\n",
+    "    print(p+': ')\n",
+    "    print('  healthy: ', np.round(greycoprops(glcm_healthy, p)[0,0],2))\n",
+    "    print('  cancer: ', None)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Exercise 1\n",
+    "\n",
+    "In this exercise you will train a logistic regression classifier to detect cancer using GLCM features. Complete code below as follows:\n",
+    "* Extract two GLCM features of your choice. To do that, complete the function `getGLCMfeatures`. Feature extraction code is given.\n",
+    "* Fit the logistic regression model to the training data and calculate training performance using function `PerformanceMeasures`.\n",
+    "* Evaluate performance on the test data using function `PerformanceMeasures`.\n",
+    "* Amend features extracted in function `getGLCMfeatures` to achieve good performance of the model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.preprocessing import StandardScaler\n",
+    "from sklearn.metrics import recall_score\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "\n",
+    "def getGLCMfeatures(im):\n",
+    "    im = np.round(im*63).astype('uint8')\n",
+    "    glcm = greycomatrix(im, [3], [0], 64)\n",
+    "    feature1 = greycoprops(glcm, None)[0, 0]\n",
+    "    feature2 = None\n",
+    "    return feature1, feature2\n",
+    "\n",
+    "def PerformanceMeasures(model,X,y): \n",
+    "\n",
+    "    accuracy = model.score(X,y)\n",
+    "    y_pred = model.predict(X)\n",
+    "    sensitivity = recall_score(y,y_pred)\n",
+    "    specificity = recall_score(y,y_pred,pos_label=0)\n",
+    "\n",
+    "    print('Accuracy: ', round(accuracy,2))\n",
+    "    print('Sensitivity: ', round(sensitivity,2))\n",
+    "    print('Specificity: ', round(specificity,2))\n",
+    "\n",
+    "# feature extraction\n",
+    "X_train_features = []\n",
+    "for im in X_train:\n",
+    "    X_train_features.append(getGLCMfeatures(im))\n",
+    "X_train_features = np.asarray(X_train_features)\n",
+    "scaler= StandardScaler()\n",
+    "X_train_features=scaler.fit_transform(X_train_features)\n",
+    "\n",
+    "# fit model\n",
+    "model = None\n",
+    "\n",
+    "print('Training perforance:')\n",
+    "\n",
+    "\n",
+    "# test\n",
+    "X_test_features  = []\n",
+    "for im in X_test:\n",
+    "    X_test_features.append(getGLCMfeatures(im))\n",
+    "X_test_features  = np.asarray(X_test_features)\n",
+    "X_test_features=scaler.fit_transform(X_test_features)\n",
+    "\n",
+    "print('Test performance:')\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Cancer detection using localised feature descriptors\n",
+    "\n",
+    "Now we will try to train a classifier using the DAISY descriptor instead. First, let's extract the DAISY features from the histological images. \n",
+    "\n",
+    "\n",
+    "In the lectures we have seen a number of feature extractors that are available at `skimage`, including `daisy`. \n",
+    "\n",
+    "**Activity 4:** Run the code below to perform feature extraction using `skimage` object `daisy` and visualise your extracted features. \n",
+    "* Change the parameters `step` and `radius` to see how the daisy extractor changes.\n",
+    "* Set `step` to 60 and `radius` to 30. Then try to change the other parameters of the DAISY descriptor."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from skimage.feature import daisy\n",
+    "\n",
+    "# example feature extraction using daisy\n",
+    "features_daisy, visualisation_daisy = daisy(healthy, step=50, radius=20, rings=2, histograms=8, orientations=8, visualize=True)\n",
+    "plt.imshow(visualisation_daisy)\n",
+    "plt.title('Daisy')\n",
+    "# Extracted features\n",
+    "print('Feature vector shape daisy: ', features_daisy.shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Exercise 2 (optional)\n",
+    "\n",
+    "Train a classifier to detect cancer in histological images using features extracted by DAISY descriptor.\n",
+    "* Complete the function `daisy_feature_extractor`. *Hint: Flatten the features after exraction.*\n",
+    "* Run the code below to extract the daisy features for training and test sets. This may take a while to run."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Feature extractor\n",
+    "def daisy_feature_extractor(image):    \n",
+    "    return None\n",
+    "\n",
+    "# Perform feature extraction for both training and test set\n",
+    "\n",
+    "X_train_features = []\n",
+    "X_test_features  = []\n",
+    "\n",
+    "# Go through all the images, perform feature extraction and then append them to the list\n",
+    "for img in X_train:\n",
+    "    X_train_features.append(daisy_feature_extractor(img))\n",
+    "for img in X_test:\n",
+    "    X_test_features.append(daisy_feature_extractor(img))\n",
+    "    \n",
+    "# Make the lists back into numpy arrays\n",
+    "X_train_features = np.asarray(X_train_features)\n",
+    "X_test_features  = np.asarray(X_test_features)\n",
+    "\n",
+    "# Print dimensions\n",
+    "print('Feature matrix train: ', X_train_features.shape)\n",
+    "print('Feature matrix test: ', X_test_features.shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "* Train a random forest classifier to detect cancer\n",
+    "* Evaluate training and test performance"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "model = RandomForestClassifier(min_samples_leaf = 50) \n",
+    "\n",
+    "\n",
+    "print('Training performance:')\n",
+    "\n",
+    "\n",
+    "print('Test performance:')\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "* Compare the performance to GLSM features"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Answer:** "
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/Chapter 8 - Features/Notebooks/8.1 Feature Extraction.ipynb b/Chapter 8 - Features/Notebooks/8.1 Feature Extraction.ipynb
@@ -11,7 +11,9 @@
     "\n",
     "### Load the dataset\n",
     "\n",
-    "Run the code below to load the dataset from the file `histological_data.npz`."
+    "Run the code below to load the dataset from the file `histological_data.npz`.\n",
+    "\n",
+    "*Note: Download the dataset from* https://gin.g-node.org/MachineLearningBiomedApplications/data *and place it in the folder `datasets`*"
    ]
   },
   {

Original file line number	Diff line number	Diff line change
`@@ -11,7 +11,9 @@`
`11`	`11`	`"\n",`
`12`	`12`	`"### Load the dataset\n",`
`13`	`13`	`"\n",`
`14`		- "Run the code below to load the dataset from the file `histological_data.npz`."
	`14`	+ "Run the code below to load the dataset from the file `histological_data.npz`.\n",
	`15`	`+ "\n",`
	`16`	+ "Note: Download the dataset from https://gin.g-node.org/MachineLearningBiomedApplications/data and place it in the folder `datasets`"
`15`	`17`	`]`
`16`	`18`	`},`
`17`	`19`	`{`