Skip to content

Commit 75672df

Browse files
committed
Chapter 8
1 parent 120c5e9 commit 75672df

File tree

2 files changed

+355
-1
lines changed

2 files changed

+355
-1
lines changed
Lines changed: 352 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,352 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Feature extraction\n",
8+
"## Detecting cancer from histopatological images\n",
9+
"In this tutorial we will apply feature extractors to detect cancer in histopatological images of breast tissue. We will use selected images from the PatchCamelyon dataset https://github.com/basveeling/pcam.\n",
10+
"<img src=\"pictures/pcam.jpg\" style=\"max-width:100%; width: 100%; max-width: none\">\n",
11+
"\n",
12+
"### Load the dataset\n",
13+
"\n",
14+
"Run the code below to load the dataset from the file `histological_data.npz`.\n",
15+
"\n",
16+
"*Note: Download the dataset from* https://gin.g-node.org/MachineLearningBiomedApplications/data *and place it in the folder `datasets`*"
17+
]
18+
},
19+
{
20+
"cell_type": "code",
21+
"execution_count": null,
22+
"metadata": {},
23+
"outputs": [],
24+
"source": [
25+
"import numpy as np\n",
26+
"\n",
27+
"# Load dataset from .npz file\n",
28+
"data = np.load('datasets/histological_data.npz')\n",
29+
"\n",
30+
"# Train images and labels\n",
31+
"X_train = data['X_train']\n",
32+
"y_train = data['y_train'].astype('int')\n",
33+
"\n",
34+
"# Test images and labels\n",
35+
"X_test = data['X_test']\n",
36+
"y_test = data['y_test'].astype('int')\n",
37+
"\n",
38+
"# Print shapes here\n",
39+
"print('Training data - images:', X_train.shape)\n",
40+
"print('Training data - labels:',y_train.shape)\n",
41+
"print('Test data - images:',X_test.shape)\n",
42+
"print('Test data - labels:',y_test.shape)\n",
43+
"print('Labels: ', np.unique(y_test))"
44+
]
45+
},
46+
{
47+
"cell_type": "markdown",
48+
"metadata": {},
49+
"source": [
50+
"**Activity 1:** Answer the following questions:\n",
51+
"* How many training samples we have?\n",
52+
"* How many test samples we have?\n",
53+
"* What is the dimension of each sample image?\n",
54+
"* How many labels we have?\n",
55+
"\n",
56+
"**Answer:** "
57+
]
58+
},
59+
{
60+
"cell_type": "markdown",
61+
"metadata": {},
62+
"source": [
63+
"Let's now plot a few example histopathological images. Note that label 1 means presence of cancerous cells."
64+
]
65+
},
66+
{
67+
"cell_type": "code",
68+
"execution_count": null,
69+
"metadata": {},
70+
"outputs": [],
71+
"source": [
72+
"import matplotlib.pyplot as plt\n",
73+
"\n",
74+
"id_images = [4, 5, 6, 7]\n",
75+
"\n",
76+
"plt.figure(figsize=(15, 8))\n",
77+
"for i in np.arange(0, 4):\n",
78+
" plt.subplot(1, 4, i+1)\n",
79+
" plt.imshow(X_train[id_images[i], :, :], cmap='gray')\n",
80+
" plt.title('label: ' + str(y_train[id_images[i]]))"
81+
]
82+
},
83+
{
84+
"cell_type": "markdown",
85+
"metadata": {},
86+
"source": [
87+
"# Cancer detection using texture descriptors\n",
88+
"\n",
89+
"We will now calculate the texture descriptors using **Grey-level co-ocurrence matrix (GLCM)**. The matrix can be calculated using `skimage` object `greycomatrix`.\n",
90+
"\n",
91+
"We will select one healthy and one cancerous sample image. The GLCM for the healthy sample has been generated and plotted for you. \n",
92+
"\n",
93+
"**Activity 2:** Do the same for the cancerous sample. Do the matrices look different? Can you think why?\n",
94+
"\n",
95+
"**Answer:** "
96+
]
97+
},
98+
{
99+
"cell_type": "code",
100+
"execution_count": null,
101+
"metadata": {},
102+
"outputs": [],
103+
"source": [
104+
"# example images\n",
105+
"healthy = X_train[7, :, :] \n",
106+
"cancer = X_train[5, :, :] \n",
107+
"\n",
108+
"# calculate and plot GLCM\n",
109+
"from skimage.feature import greycomatrix\n",
110+
"\n",
111+
"plt.figure(figsize=(10,4))\n",
112+
"\n",
113+
"plt.subplot(121)\n",
114+
"glcm_healthy = greycomatrix(np.round(healthy*63).astype('uint8'), [3], [0],64)\n",
115+
"plt.imshow(glcm_healthy.reshape(64,64), cmap='gray')\n",
116+
"plt.title('GLCM healthy')\n",
117+
"\n",
118+
"plt.subplot(122)\n",
119+
"glcm_cancer = None\n",
120+
"\n",
121+
"_=plt.title('GLCM cancer')"
122+
]
123+
},
124+
{
125+
"cell_type": "markdown",
126+
"metadata": {},
127+
"source": [
128+
"Now we can calculate some statistical properties from the GLCM matrix. We can do that using `skimage` object `greycoprops`. Print out different statistical measures for the healthy and cancerous tissue:\n",
129+
"* `'contrast'`\n",
130+
"* `'dissimilarity'`\n",
131+
"* `'homogeneity'`\n",
132+
"* `'energy'`\n",
133+
"* `'correlation'`\n",
134+
"\n",
135+
"**Activity 3:** Complete the code below to generate all five measures for both healthy and cancerous samples."
136+
]
137+
},
138+
{
139+
"cell_type": "code",
140+
"execution_count": null,
141+
"metadata": {},
142+
"outputs": [],
143+
"source": [
144+
"from skimage.feature import greycoprops\n",
145+
"properties = ['contrast', 'dissimilarity']\n",
146+
"\n",
147+
"for p in properties:\n",
148+
" print(p+': ')\n",
149+
" print(' healthy: ', np.round(greycoprops(glcm_healthy, p)[0,0],2))\n",
150+
" print(' cancer: ', None)\n"
151+
]
152+
},
153+
{
154+
"cell_type": "markdown",
155+
"metadata": {},
156+
"source": [
157+
"## Exercise 1\n",
158+
"\n",
159+
"In this exercise you will train a logistic regression classifier to detect cancer using GLCM features. Complete code below as follows:\n",
160+
"* Extract two GLCM features of your choice. To do that, complete the function `getGLCMfeatures`. Feature extraction code is given.\n",
161+
"* Fit the logistic regression model to the training data and calculate training performance using function `PerformanceMeasures`.\n",
162+
"* Evaluate performance on the test data using function `PerformanceMeasures`.\n",
163+
"* Amend features extracted in function `getGLCMfeatures` to achieve good performance of the model."
164+
]
165+
},
166+
{
167+
"cell_type": "code",
168+
"execution_count": null,
169+
"metadata": {},
170+
"outputs": [],
171+
"source": [
172+
"from sklearn.preprocessing import StandardScaler\n",
173+
"from sklearn.metrics import recall_score\n",
174+
"from sklearn.linear_model import LogisticRegression\n",
175+
"\n",
176+
"def getGLCMfeatures(im):\n",
177+
" im = np.round(im*63).astype('uint8')\n",
178+
" glcm = greycomatrix(im, [3], [0], 64)\n",
179+
" feature1 = greycoprops(glcm, None)[0, 0]\n",
180+
" feature2 = None\n",
181+
" return feature1, feature2\n",
182+
"\n",
183+
"def PerformanceMeasures(model,X,y): \n",
184+
"\n",
185+
" accuracy = model.score(X,y)\n",
186+
" y_pred = model.predict(X)\n",
187+
" sensitivity = recall_score(y,y_pred)\n",
188+
" specificity = recall_score(y,y_pred,pos_label=0)\n",
189+
"\n",
190+
" print('Accuracy: ', round(accuracy,2))\n",
191+
" print('Sensitivity: ', round(sensitivity,2))\n",
192+
" print('Specificity: ', round(specificity,2))\n",
193+
"\n",
194+
"# feature extraction\n",
195+
"X_train_features = []\n",
196+
"for im in X_train:\n",
197+
" X_train_features.append(getGLCMfeatures(im))\n",
198+
"X_train_features = np.asarray(X_train_features)\n",
199+
"scaler= StandardScaler()\n",
200+
"X_train_features=scaler.fit_transform(X_train_features)\n",
201+
"\n",
202+
"# fit model\n",
203+
"model = None\n",
204+
"\n",
205+
"print('Training perforance:')\n",
206+
"\n",
207+
"\n",
208+
"# test\n",
209+
"X_test_features = []\n",
210+
"for im in X_test:\n",
211+
" X_test_features.append(getGLCMfeatures(im))\n",
212+
"X_test_features = np.asarray(X_test_features)\n",
213+
"X_test_features=scaler.fit_transform(X_test_features)\n",
214+
"\n",
215+
"print('Test performance:')\n"
216+
]
217+
},
218+
{
219+
"cell_type": "markdown",
220+
"metadata": {},
221+
"source": [
222+
"# Cancer detection using localised feature descriptors\n",
223+
"\n",
224+
"Now we will try to train a classifier using the DAISY descriptor instead. First, let's extract the DAISY features from the histological images. \n",
225+
"\n",
226+
"\n",
227+
"In the lectures we have seen a number of feature extractors that are available at `skimage`, including `daisy`. \n",
228+
"\n",
229+
"**Activity 4:** Run the code below to perform feature extraction using `skimage` object `daisy` and visualise your extracted features. \n",
230+
"* Change the parameters `step` and `radius` to see how the daisy extractor changes.\n",
231+
"* Set `step` to 60 and `radius` to 30. Then try to change the other parameters of the DAISY descriptor."
232+
]
233+
},
234+
{
235+
"cell_type": "code",
236+
"execution_count": null,
237+
"metadata": {},
238+
"outputs": [],
239+
"source": [
240+
"from skimage.feature import daisy\n",
241+
"\n",
242+
"# example feature extraction using daisy\n",
243+
"features_daisy, visualisation_daisy = daisy(healthy, step=50, radius=20, rings=2, histograms=8, orientations=8, visualize=True)\n",
244+
"plt.imshow(visualisation_daisy)\n",
245+
"plt.title('Daisy')\n",
246+
"# Extracted features\n",
247+
"print('Feature vector shape daisy: ', features_daisy.shape)"
248+
]
249+
},
250+
{
251+
"cell_type": "markdown",
252+
"metadata": {},
253+
"source": [
254+
"## Exercise 2 (optional)\n",
255+
"\n",
256+
"Train a classifier to detect cancer in histological images using features extracted by DAISY descriptor.\n",
257+
"* Complete the function `daisy_feature_extractor`. *Hint: Flatten the features after exraction.*\n",
258+
"* Run the code below to extract the daisy features for training and test sets. This may take a while to run."
259+
]
260+
},
261+
{
262+
"cell_type": "code",
263+
"execution_count": null,
264+
"metadata": {},
265+
"outputs": [],
266+
"source": [
267+
"# Feature extractor\n",
268+
"def daisy_feature_extractor(image): \n",
269+
" return None\n",
270+
"\n",
271+
"# Perform feature extraction for both training and test set\n",
272+
"\n",
273+
"X_train_features = []\n",
274+
"X_test_features = []\n",
275+
"\n",
276+
"# Go through all the images, perform feature extraction and then append them to the list\n",
277+
"for img in X_train:\n",
278+
" X_train_features.append(daisy_feature_extractor(img))\n",
279+
"for img in X_test:\n",
280+
" X_test_features.append(daisy_feature_extractor(img))\n",
281+
" \n",
282+
"# Make the lists back into numpy arrays\n",
283+
"X_train_features = np.asarray(X_train_features)\n",
284+
"X_test_features = np.asarray(X_test_features)\n",
285+
"\n",
286+
"# Print dimensions\n",
287+
"print('Feature matrix train: ', X_train_features.shape)\n",
288+
"print('Feature matrix test: ', X_test_features.shape)"
289+
]
290+
},
291+
{
292+
"cell_type": "markdown",
293+
"metadata": {},
294+
"source": [
295+
"* Train a random forest classifier to detect cancer\n",
296+
"* Evaluate training and test performance"
297+
]
298+
},
299+
{
300+
"cell_type": "code",
301+
"execution_count": null,
302+
"metadata": {},
303+
"outputs": [],
304+
"source": [
305+
"from sklearn.ensemble import RandomForestClassifier\n",
306+
"model = RandomForestClassifier(min_samples_leaf = 50) \n",
307+
"\n",
308+
"\n",
309+
"print('Training performance:')\n",
310+
"\n",
311+
"\n",
312+
"print('Test performance:')\n",
313+
"\n"
314+
]
315+
},
316+
{
317+
"cell_type": "markdown",
318+
"metadata": {},
319+
"source": [
320+
"* Compare the performance to GLSM features"
321+
]
322+
},
323+
{
324+
"cell_type": "markdown",
325+
"metadata": {},
326+
"source": [
327+
"**Answer:** "
328+
]
329+
}
330+
],
331+
"metadata": {
332+
"kernelspec": {
333+
"display_name": "Python 3",
334+
"language": "python",
335+
"name": "python3"
336+
},
337+
"language_info": {
338+
"codemirror_mode": {
339+
"name": "ipython",
340+
"version": 3
341+
},
342+
"file_extension": ".py",
343+
"mimetype": "text/x-python",
344+
"name": "python",
345+
"nbconvert_exporter": "python",
346+
"pygments_lexer": "ipython3",
347+
"version": "3.8.3"
348+
}
349+
},
350+
"nbformat": 4,
351+
"nbformat_minor": 4
352+
}

Chapter 8 - Features/Notebooks/8.1 Feature Extraction.ipynb

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,9 @@
1111
"\n",
1212
"### Load the dataset\n",
1313
"\n",
14-
"Run the code below to load the dataset from the file `histological_data.npz`."
14+
"Run the code below to load the dataset from the file `histological_data.npz`.\n",
15+
"\n",
16+
"*Note: Download the dataset from* https://gin.g-node.org/MachineLearningBiomedApplications/data *and place it in the folder `datasets`*"
1517
]
1618
},
1719
{

0 commit comments

Comments
 (0)