Skip to content

Commit f4a4565

Browse files
committed
Add notebooks for hands-on sessions
1 parent 8757d31 commit f4a4565

22 files changed

+11898
-0
lines changed

hands-on/010_underfitting_overfitting_complete.ipynb

Lines changed: 647 additions & 0 deletions
Large diffs are not rendered by default.

hands-on/010_underfitting_overfitting_courageous.ipynb

Lines changed: 445 additions & 0 deletions
Large diffs are not rendered by default.

hands-on/010_underfitting_overfitting_lazy.ipynb

Lines changed: 446 additions & 0 deletions
Large diffs are not rendered by default.

hands-on/020_mnist_data_exploration_complete.ipynb

Lines changed: 355 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# MNIST: learning to recognize handwritten digits"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"## Dataset exploration"
15+
]
16+
},
17+
{
18+
"cell_type": "markdown",
19+
"metadata": {},
20+
"source": [
21+
"Before starting a machine learning or data science task, it is always useful to familiarize yourself with the data set and its context."
22+
]
23+
},
24+
{
25+
"cell_type": "markdown",
26+
"metadata": {},
27+
"source": [
28+
"### Required imports"
29+
]
30+
},
31+
{
32+
"cell_type": "code",
33+
"execution_count": null,
34+
"metadata": {},
35+
"outputs": [],
36+
"source": [
37+
"from collections import Counter\n",
38+
"from keras.datasets import mnist\n",
39+
"import matplotlib.pyplot as plt\n",
40+
"%matplotlib inline\n",
41+
"import numpy as np"
42+
]
43+
},
44+
{
45+
"cell_type": "markdown",
46+
"metadata": {},
47+
"source": [
48+
"### Obtaining the dataset"
49+
]
50+
},
51+
{
52+
"cell_type": "markdown",
53+
"metadata": {},
54+
"source": [
55+
"In Keras' datasets module we have a handle to the MNIST dataset we want to use in this notebook. Download the training and test set for this data."
56+
]
57+
},
58+
{
59+
"cell_type": "code",
60+
"execution_count": null,
61+
"metadata": {},
62+
"outputs": [],
63+
"source": [
64+
"(x_train, y_train), (x_test, y_test) = mnist.load_data()"
65+
]
66+
},
67+
{
68+
"cell_type": "markdown",
69+
"metadata": {},
70+
"source": [
71+
"### Dimensions and types"
72+
]
73+
},
74+
{
75+
"cell_type": "markdown",
76+
"metadata": {},
77+
"source": [
78+
"Determine the shape and type of the training and the test set."
79+
]
80+
},
81+
{
82+
"cell_type": "markdown",
83+
"metadata": {},
84+
"source": [
85+
"The training set has 60,000 examples, the test set 10,000. The input is a 28 $\\times$ 28 matrix of unsigned 8-bit integers, the output a single unsigned 8-bit integer."
86+
]
87+
},
88+
{
89+
"cell_type": "markdown",
90+
"metadata": {},
91+
"source": [
92+
"### Data semantics"
93+
]
94+
},
95+
{
96+
"cell_type": "markdown",
97+
"metadata": {},
98+
"source": [
99+
"Each input represents a scanned grayscale image of a handwritten digit, the output is the corresponding integer. Visualize the image, and check the label for the first training example."
100+
]
101+
},
102+
{
103+
"cell_type": "code",
104+
"execution_count": null,
105+
"metadata": {},
106+
"outputs": [],
107+
"source": [
108+
"rows = 5\n",
109+
"cols = 7\n",
110+
"figure, axes = plt.subplots(rows, cols, figsize=(5, 3))\n",
111+
"plt.subplots_adjust(wspace=0.1, hspace=0.1)\n",
112+
"for img_nr in range(rows*cols):\n",
113+
" row = img_nr//cols\n",
114+
" col = img_nr % cols\n",
115+
" axes[row, col].get_xaxis().set_visible(False)\n",
116+
" axes[row, col].get_yaxis().set_visible(False)\n",
117+
" axes[row, col].imshow(x_train[img_nr], cmap='gray')"
118+
]
119+
},
120+
{
121+
"cell_type": "code",
122+
"execution_count": null,
123+
"metadata": {},
124+
"outputs": [],
125+
"source": [
126+
"y_train[:rows*cols].reshape(rows, cols)"
127+
]
128+
},
129+
{
130+
"cell_type": "markdown",
131+
"metadata": {},
132+
"source": [
133+
"So this proves that I'm certainly not the only one cursed with bad handwriting."
134+
]
135+
},
136+
{
137+
"cell_type": "markdown",
138+
"metadata": {},
139+
"source": [
140+
"### Data distribution"
141+
]
142+
},
143+
{
144+
"cell_type": "markdown",
145+
"metadata": {},
146+
"source": [
147+
"An important question is whether all digits are represented in the training and test set, and what the distribution is. This may have an impact on the accuracy of the trained model."
148+
]
149+
},
150+
{
151+
"cell_type": "markdown",
152+
"metadata": {},
153+
"source": [
154+
"Although some digits like 1 are overrepresented, and others, e.g., 5 are underrepresented, the distribution seems to be reasonably uniform, and it is likely no special care needs to be taken."
155+
]
156+
}
157+
],
158+
"metadata": {
159+
"kernelspec": {
160+
"display_name": "Python 3",
161+
"language": "python",
162+
"name": "python3"
163+
},
164+
"language_info": {
165+
"codemirror_mode": {
166+
"name": "ipython",
167+
"version": 3
168+
},
169+
"file_extension": ".py",
170+
"mimetype": "text/x-python",
171+
"name": "python",
172+
"nbconvert_exporter": "python",
173+
"pygments_lexer": "ipython3",
174+
"version": "3.7.3"
175+
}
176+
},
177+
"nbformat": 4,
178+
"nbformat_minor": 2
179+
}

0 commit comments

Comments
 (0)