|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "id": "4ff5888c", |
| 6 | + "metadata": {}, |
| 7 | + "source": [ |
| 8 | + "Sascha Spors,\n", |
| 9 | + "Professorship Signal Theory and Digital Signal Processing,\n", |
| 10 | + "Institute of Communications Engineering (INT),\n", |
| 11 | + "Faculty of Computer Science and Electrical Engineering (IEF),\n", |
| 12 | + "University of Rostock,\n", |
| 13 | + "Germany\n", |
| 14 | + "\n", |
| 15 | + "# Data Driven Audio Signal Processing - A Tutorial with Computational Examples\n", |
| 16 | + "\n", |
| 17 | + "Winter Semester 2022/23 (Master Course #24512)\n", |
| 18 | + "\n", |
| 19 | + "- lecture: https://github.com/spatialaudio/data-driven-audio-signal-processing-lecture\n", |
| 20 | + "- tutorial: https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise\n", |
| 21 | + "\n", |
| 22 | + "Feel free to contact lecturer frank.schultz@uni-rostock.de" |
| 23 | + ] |
| 24 | + }, |
| 25 | + { |
| 26 | + "cell_type": "markdown", |
| 27 | + "id": "a199b3df", |
| 28 | + "metadata": {}, |
| 29 | + "source": [ |
| 30 | + "# XOR with Two-Layer Non-Linear Model\n", |
| 31 | + "\n", |
| 32 | + "- we use TensorFlow & Keras API\n", |
| 33 | + "- we follow the discussion in the highly recommended textbook of I. Goodfellow, Y. Bengio, A. Courville, \"Deep Learning\". MIT Press, 2016, ch 6.1" |
| 34 | + ] |
| 35 | + }, |
| 36 | + { |
| 37 | + "cell_type": "code", |
| 38 | + "execution_count": null, |
| 39 | + "id": "80a46046", |
| 40 | + "metadata": {}, |
| 41 | + "outputs": [], |
| 42 | + "source": [ |
| 43 | + "import matplotlib.pyplot as plt\n", |
| 44 | + "import numpy as np\n", |
| 45 | + "import tensorflow as tf\n", |
| 46 | + "import tensorflow.keras as keras\n", |
| 47 | + "import tensorflow.keras.backend as K\n", |
| 48 | + "\n", |
| 49 | + "print('TF version', tf.__version__, # we used 2.10.0\n", |
| 50 | + " '\\nKeras version', keras.__version__) # we used 2.10.0\n", |
| 51 | + "\n", |
| 52 | + "tf.keras.backend.set_floatx('float64') # we could use double precision" |
| 53 | + ] |
| 54 | + }, |
| 55 | + { |
| 56 | + "cell_type": "code", |
| 57 | + "execution_count": null, |
| 58 | + "id": "9d77047c", |
| 59 | + "metadata": {}, |
| 60 | + "outputs": [], |
| 61 | + "source": [ |
| 62 | + "verbose = 1 # plot training status" |
| 63 | + ] |
| 64 | + }, |
| 65 | + { |
| 66 | + "cell_type": "code", |
| 67 | + "execution_count": null, |
| 68 | + "id": "b9c96247", |
| 69 | + "metadata": {}, |
| 70 | + "outputs": [], |
| 71 | + "source": [ |
| 72 | + "# data set consists of the 4 conditions for the XOR logical table\n", |
| 73 | + "X = np.array([[0., 0.], [0., 1.], [1., 0.], [1., 1.]])\n", |
| 74 | + "print('X\\n', X)\n", |
| 75 | + "y = np.array([[0.], [1.], [1.], [0.]])\n", |
| 76 | + "print('y\\n', y)" |
| 77 | + ] |
| 78 | + }, |
| 79 | + { |
| 80 | + "cell_type": "code", |
| 81 | + "execution_count": null, |
| 82 | + "id": "3ad27d96", |
| 83 | + "metadata": {}, |
| 84 | + "outputs": [], |
| 85 | + "source": [ |
| 86 | + "# a simple XOR non-linear model with two layers is known from the textbook\n", |
| 87 | + "# I. Goodfellow, Y. Bengio, A. Courville, Deep Learning. MIT Press, 2016, ch 6.1\n", |
| 88 | + "# the model parameters are given in the book and it is stated that these\n", |
| 89 | + "# belong to the global minimum for the mean squared error loss function\n", |
| 90 | + "\n", |
| 91 | + "# layer 1 with relu activation and the weights/bias:\n", |
| 92 | + "wl1 = np.array([[1, 1], [1, 1]])\n", |
| 93 | + "bl1 = np.array([[0], [-1]])\n", |
| 94 | + "# layer 2 with linear activation an dthe weights/bias:\n", |
| 95 | + "wl2 = np.array([[1], [-2]])\n", |
| 96 | + "bl2 = np.array([[0]])\n", |
| 97 | + "\n", |
| 98 | + "# we could calc model predictions on the data in X\n", |
| 99 | + "# layer 1 with two perceptrons: apply weights / bias\n", |
| 100 | + "z1 = wl1.T @ X.T + bl1 # transpose input to be TF compatible\n", |
| 101 | + "# layer 1 with two perceptrons: apply relu activation\n", |
| 102 | + "z1[z1 < 0] = 0\n", |
| 103 | + "# layer 2 with one perceptron: apply weights / bias\n", |
| 104 | + "z2 = wl2.T @ z1 + bl2\n", |
| 105 | + "# layer 2 with one perceptron: apply linear activation\n", |
| 106 | + "y_pred = z2.T # transpose output to be TF compatible\n", |
| 107 | + "print(y_pred)\n", |
| 108 | + "print(y == y_pred) # check true and predicted data" |
| 109 | + ] |
| 110 | + }, |
| 111 | + { |
| 112 | + "cell_type": "markdown", |
| 113 | + "id": "b8294af6", |
| 114 | + "metadata": {}, |
| 115 | + "source": [ |
| 116 | + "## Tensor Flow Model\n", |
| 117 | + "\n", |
| 118 | + "The model is actually not easy to train to the global minimum, as it is unusual to train a binary classification problem with MSE loss and linear activation (which is rather typical for regression tasks).\n", |
| 119 | + "\n", |
| 120 | + "So, we actually expect two numbers, 0 and 1, as model output. However, the linear activation yields real numbers as model output, which in the optimum case are 0 and 1, but for not optimum trained models they might be very close to 0 and 1 or even completely 'wrong'. So, the model needs to be trained exactly to the above given weights, to have the intended binary classification characteristics.\n", |
| 121 | + "\n", |
| 122 | + "This is a nice toy example to see what model training can (not) do on a rather simple problem. We should spend some time to really understand, how the model output is calculated, i.e. how the model prediction works. If we got this, we are ready to work with larger models." |
| 123 | + ] |
| 124 | + }, |
| 125 | + { |
| 126 | + "cell_type": "code", |
| 127 | + "execution_count": null, |
| 128 | + "id": "1c792c68", |
| 129 | + "metadata": {}, |
| 130 | + "outputs": [], |
| 131 | + "source": [ |
| 132 | + "epochs = 2**8\n", |
| 133 | + "batch_size = X.shape[0]" |
| 134 | + ] |
| 135 | + }, |
| 136 | + { |
| 137 | + "cell_type": "code", |
| 138 | + "execution_count": null, |
| 139 | + "id": "842503d9", |
| 140 | + "metadata": {}, |
| 141 | + "outputs": [], |
| 142 | + "source": [ |
| 143 | + "optimizer = keras.optimizers.Adam()\n", |
| 144 | + "loss = keras.losses.MeanSquaredError()\n", |
| 145 | + "metrics = [keras.metrics.MeanSquaredError()]\n", |
| 146 | + "model = keras.Sequential()\n", |
| 147 | + "model.add(keras.Input(shape=(2,)))\n", |
| 148 | + "model.add(keras.layers.Dense(2, activation='relu'))\n", |
| 149 | + "model.add(keras.layers.Dense(1, activation='linear'))\n", |
| 150 | + "model.compile(optimizer=optimizer, loss=loss, metrics=metrics)\n", |
| 151 | + "tw = np.sum([K.count_params(w) for w in model.trainable_weights])\n", |
| 152 | + "print('\\ntrainable_weights:', tw, '\\n')" |
| 153 | + ] |
| 154 | + }, |
| 155 | + { |
| 156 | + "cell_type": "markdown", |
| 157 | + "id": "bc525703", |
| 158 | + "metadata": {}, |
| 159 | + "source": [ |
| 160 | + "We could init the model parameters close (e.g. `offset=1e-2`) or even exact (`offset=0`) to the optimum parameters that are known above. To use this set `if True:`.\n", |
| 161 | + "\n", |
| 162 | + "With a robust gradient descent method, such as Adam, training should get close to or remain at the optimum parameters." |
| 163 | + ] |
| 164 | + }, |
| 165 | + { |
| 166 | + "cell_type": "code", |
| 167 | + "execution_count": null, |
| 168 | + "id": "a0198cd7", |
| 169 | + "metadata": {}, |
| 170 | + "outputs": [], |
| 171 | + "source": [ |
| 172 | + "wl1 = np.array([[1, 1], [1, 1]])\n", |
| 173 | + "bl1 = np.array([0, -1])\n", |
| 174 | + "wl2 = np.array([[1], [-2]])\n", |
| 175 | + "bl2 = np.array([0])\n", |
| 176 | + "if True:\n", |
| 177 | + " offset = 1e-5\n", |
| 178 | + " model.set_weights([wl1+offset, bl1+offset, wl2+offset, bl2])\n", |
| 179 | + "model.get_weights()" |
| 180 | + ] |
| 181 | + }, |
| 182 | + { |
| 183 | + "cell_type": "markdown", |
| 184 | + "id": "4f4e4f44", |
| 185 | + "metadata": {}, |
| 186 | + "source": [ |
| 187 | + "### Train / Fit the Model" |
| 188 | + ] |
| 189 | + }, |
| 190 | + { |
| 191 | + "cell_type": "code", |
| 192 | + "execution_count": null, |
| 193 | + "id": "43d50b98", |
| 194 | + "metadata": {}, |
| 195 | + "outputs": [], |
| 196 | + "source": [ |
| 197 | + "model.fit(X, y,\n", |
| 198 | + " epochs=epochs,\n", |
| 199 | + " batch_size=batch_size,\n", |
| 200 | + " verbose=verbose)" |
| 201 | + ] |
| 202 | + }, |
| 203 | + { |
| 204 | + "cell_type": "code", |
| 205 | + "execution_count": null, |
| 206 | + "id": "dd901f4b", |
| 207 | + "metadata": {}, |
| 208 | + "outputs": [], |
| 209 | + "source": [ |
| 210 | + "print(model.summary())\n", |
| 211 | + "print('model weights\\n', model.get_weights())" |
| 212 | + ] |
| 213 | + }, |
| 214 | + { |
| 215 | + "cell_type": "code", |
| 216 | + "execution_count": null, |
| 217 | + "id": "24158b77", |
| 218 | + "metadata": {}, |
| 219 | + "outputs": [], |
| 220 | + "source": [ |
| 221 | + "results = model.evaluate(X, y,\n", |
| 222 | + " batch_size=X.shape[0],\n", |
| 223 | + " verbose=False)\n", |
| 224 | + "y_pred = model.predict(X)" |
| 225 | + ] |
| 226 | + }, |
| 227 | + { |
| 228 | + "cell_type": "code", |
| 229 | + "execution_count": null, |
| 230 | + "id": "b17e2fdb", |
| 231 | + "metadata": {}, |
| 232 | + "outputs": [], |
| 233 | + "source": [ |
| 234 | + "print(model.loss(y, y_pred))" |
| 235 | + ] |
| 236 | + }, |
| 237 | + { |
| 238 | + "cell_type": "code", |
| 239 | + "execution_count": null, |
| 240 | + "id": "b7edb9f1", |
| 241 | + "metadata": {}, |
| 242 | + "outputs": [], |
| 243 | + "source": [ |
| 244 | + "def predict_class(y):\n", |
| 245 | + " y[y < 0.5], y[y >= 0.5] = 0, 1" |
| 246 | + ] |
| 247 | + }, |
| 248 | + { |
| 249 | + "cell_type": "code", |
| 250 | + "execution_count": null, |
| 251 | + "id": "7a59cc9e", |
| 252 | + "metadata": {}, |
| 253 | + "outputs": [], |
| 254 | + "source": [ |
| 255 | + "print('real numbered model ouput\\n', y_pred)\n", |
| 256 | + "predict_class(y_pred) # real numbered ouput -> classification (0,1) output\n", |
| 257 | + "print('classification ouput\\n', y_pred)\n", |
| 258 | + "print('check true vs. predicted:\\n', y == y_pred)" |
| 259 | + ] |
| 260 | + }, |
| 261 | + { |
| 262 | + "cell_type": "markdown", |
| 263 | + "id": "651b1eff", |
| 264 | + "metadata": {}, |
| 265 | + "source": [ |
| 266 | + "## Copyright\n", |
| 267 | + "\n", |
| 268 | + "- the notebooks are provided as [Open Educational Resources](https://en.wikipedia.org/wiki/Open_educational_resources)\n", |
| 269 | + "- feel free to use the notebooks for your own purposes\n", |
| 270 | + "- the text is licensed under [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/)\n", |
| 271 | + "- the code of the IPython examples is licensed under the [MIT license](https://opensource.org/licenses/MIT)\n", |
| 272 | + "- please attribute the work as follows: *Frank Schultz, Data Driven Audio Signal Processing - A Tutorial Featuring Computational Examples, University of Rostock* ideally with relevant file(s), github URL https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise, commit number and/or version tag, year." |
| 273 | + ] |
| 274 | + } |
| 275 | + ], |
| 276 | + "metadata": { |
| 277 | + "kernelspec": { |
| 278 | + "display_name": "myddasp", |
| 279 | + "language": "python", |
| 280 | + "name": "myddasp" |
| 281 | + }, |
| 282 | + "language_info": { |
| 283 | + "codemirror_mode": { |
| 284 | + "name": "ipython", |
| 285 | + "version": 3 |
| 286 | + }, |
| 287 | + "file_extension": ".py", |
| 288 | + "mimetype": "text/x-python", |
| 289 | + "name": "python", |
| 290 | + "nbconvert_exporter": "python", |
| 291 | + "pygments_lexer": "ipython3", |
| 292 | + "version": "3.10.6" |
| 293 | + } |
| 294 | + }, |
| 295 | + "nbformat": 4, |
| 296 | + "nbformat_minor": 5 |
| 297 | +} |
0 commit comments