softmax regression model, minor changes to other models

zotroneneis · zotroneneis · commit af6dc60f85bb · 2018-03-09T15:36:10.000+01:00
diff --git a/logistic_regression.ipynb b/logistic_regression.ipynb
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Logistic Regression in plain Python"
+    "## Logistic Regression in plain Python"
    ]
   },
   {
@@ -24,10 +24,11 @@
     "- it has a real-valued bias $b$\n",
     "- it uses a sigmoid function as its activation function\n",
     "\n",
-    "Different to linear regression, logistic regression has no closed form solution. But the cost function is convex, so we can train the model using gradient descent. In fact, **gradient descent** (or any other optimization algorithm) is guaranteed to find the global minimum (if the learning rate is small enough and enough training iterations are used). \n",
-    "* * *\n",
-    "Training a logistic regression model has different steps.\n",
+    "Different to linear regression, logistic regression has no closed form solution. But the cost function is convex, so we can train the model using gradient descent. In fact, **gradient descent** (or any other optimization algorithm) is guaranteed to find the global minimum (if the learning rate is small enough and enough training iterations are used).  \n",
+    "\n",
+    "Training a logistic regression model has different steps. In the beginning (step 0) the parameters are initialized. The other steps are repeated for a specified number of training iterations or until convergence or the parameters.\n",
     "\n",
+    "* * * \n",
     "**Step 0: ** Initialize the weight vector and bias with zeros (or small random values).\n",
     "* * *\n",
     "\n",
@@ -53,7 +54,7 @@
     "\n",
     "$ \\frac{\\partial J}{\\partial w_j} = \\frac{1}{m}\\sum_{i=1}^m\\left[\\hat{y}^{(i)}-y^{(i)}\\right]\\,x_j^{(i)}$\n",
     "\n",
-    "For the bias, the input $x_j^{(i)}$ will be given 1.\n",
+    "For the bias, the inputs $x_j^{(i)}$ will be given 1.\n",
     "* * *\n",
     "\n",
     "** Step 5: ** Update the weights and bias\n",
@@ -83,6 +84,13 @@
     "% matplotlib inline"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Dataset"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 3,
@@ -139,6 +147,13 @@
     "print('Shape y_test: ', y_test.shape)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Logistic regression model"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 50,
@@ -165,7 +180,7 @@
     "    \"\"\"\n",
     "    Computes the sigmoid function\n",
     "    \"\"\"\n",
-    "    return 1/ (1 + np.exp(-a))\n",
+    "    return 1 / (1 + np.exp(-a))\n",
     "\n",
     "def train(X, y_true,w, b, n_iter, learning_rate):\n",
     "    \"\"\"\n",
@@ -204,6 +219,13 @@
     "    return np.array(y_predict_labels)[:, np.newaxis]"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Training"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 52,
@@ -242,6 +264,13 @@
     "plt.show()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Testing the model"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 53,
@@ -283,6 +312,18 @@
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
    "version": "3.6.4"
+  },
+  "toc": {
+   "nav_menu": {},
+   "number_sections": true,
+   "sideBar": true,
+   "skip_h1_title": false,
+   "title_cell": "Table of Contents",
+   "title_sidebar": "Contents",
+   "toc_cell": false,
+   "toc_position": {},
+   "toc_section_display": true,
+   "toc_window_display": false
   }
  },
  "nbformat": 4,
diff --git a/perceptron.ipynb b/perceptron.ipynb
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Perceptron algorithm in plain Python\n",
+    "## Perceptron algorithm in plain Python\n",
     "\n",
     "The perceptron is a simple supervised machine learning algorithm and one of the earliest **neural network** architectures. It was introduced by Rosenblatt in the late 1950s. A perceptron represents a **binary linear classifier** that maps a set of training examples (of $d$ dimensional input vectors) onto binary output values using a $d-1$ dimensional hyperplane.\n",
     "\n",
@@ -73,6 +73,13 @@
     "% matplotlib inline"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Dataset"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 49,
@@ -118,10 +125,17 @@
     "y_true = y[:, np.newaxis]\n",
     "\n",
     "X_train, X_test, y_train, y_test = train_test_split(X, y_true)\n",
-    "print('Shape X_train: ', X_train.shape)\n",
-    "print('Shape y_train: ', y_train.shape)\n",
-    "print('Shape X_test: ', X_test.shape)\n",
-    "print('Shape y_test: ', y_test.shape)"
+    "print(f'Shape X_train: {X_train.shape}')\n",
+    "print(f'Shape y_train: {y_train.shape)')\n",
+    "print(f'Shape X_test: {X_test.shape}')\n",
+    "print(f'Shape y_test: {y_test.shape}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Perceptron model class"
    ]
   },
   {
@@ -169,6 +183,18 @@
     "        return self.step_function(a)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2018-03-09T14:34:26.829303Z",
+     "start_time": "2018-03-09T14:34:26.824652Z"
+    }
+   },
+   "source": [
+    "## Initialization, training and testing"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 52,
@@ -194,6 +220,18 @@
     "print(\"test accuracy: {} %\".format(100 - np.mean(np.abs(y_p_test - y_test)) * 100))"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2018-03-09T14:35:14.630757Z",
+     "start_time": "2018-03-09T14:35:14.626460Z"
+    }
+   },
+   "source": [
+    "## Visualize decision boundary"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 53,
@@ -266,6 +304,18 @@
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
    "version": "3.6.4"
+  },
+  "toc": {
+   "nav_menu": {},
+   "number_sections": true,
+   "sideBar": true,
+   "skip_h1_title": false,
+   "title_cell": "Table of Contents",
+   "title_sidebar": "Contents",
+   "toc_cell": false,
+   "toc_position": {},
+   "toc_section_display": true,
+   "toc_window_display": false
   }
  },
  "nbformat": 4,
diff --git a/simple_neural_net.py.ipynb b/simple_neural_net.py.ipynb
@@ -191,10 +191,10 @@
     "# Split the data into a training and test set\n",
     "X_train, X_test, y_train, y_test = train_test_split(X, y)\n",
     "\n",
-    "print('Shape X_train: ', X_train.shape)\n",
-    "print('Shape y_train: ', y_train.shape)\n",
-    "print('Shape X_test: ', X_test.shape)\n",
-    "print('Shape y_test: ', y_test.shape)"
+    "print(f'Shape X_train: {X_train.shape}')\n",
+    "print(f'Shape y_train: {y_train.shape}')\n",
+    "print(f'Shape X_test: {X_test.shape}')\n",
+    "print(f'Shape y_test: {y_test.shape}')"
    ]
   },
   {
@@ -219,21 +219,18 @@
     "        self.n_outputs = n_outputs\n",
     "        self.hidden = n_hidden\n",
     "\n",
-    "        # Initialie weight matrices and bias vectors\n",
+    "        # Initialize weight matrices and bias vectors\n",
     "        self.W_h = np.random.randn(self.n_inputs, self.hidden)\n",
     "        self.b_h = np.zeros((1, self.hidden))G\n",
     "        self.W_o = np.random.randn(self.hidden, self.n_outputs)\n",
     "        self.b_o = np.zeros((1, self.n_outputs))\n",
     "\n",
     "    def sigmoid(self, a):\n",
-    "        \"\"\"\n",
-    "        Applies the sigmoid function to the given input and returns the result\n",
-    "        \"\"\"\n",
     "        return 1 / (1 + np.exp(-a))\n",
     "\n",
     "    def forward_pass(self, X):\n",
     "        \"\"\"\n",
-    "        Propagates the given input forward through the net.\n",
+    "        Propagates the given input X forward through the net.\n",
     "\n",
     "        Returns:\n",
     "            A_h: matrix with activations of all hidden neurons for all input examples\n",
@@ -307,12 +304,10 @@
     "        self.b_o = self.b_o - eta * gradients[\"db_o\"]\n",
     "        self.b_h = self.b_h - eta * gradients[\"db_h\"]\n",
     "\n",
-    "    def train(self, X, y, n_iters, eta):\n",
+    "    def train(self, X, y, n_iters=500, eta=0.3):\n",
     "        \"\"\"\n",
     "        Trains the neural net on the given input data\n",
     "        \"\"\"\n",
-    "        assert eta is not None, 'learning rate must be provided'\n",
-    "        assert n_iters is not None, 'number of iterations must be provided'\n",
     "        assert X is not None, 'dataset must be provided'\n",
     "        assert y is not None, 'target values must be provided'\n",
     "\n",
@@ -324,7 +319,7 @@
     "            gradients = self.backward_pass(X, y, n_samples, outputs)\n",
     "\n",
     "            if i % 100 == 0:\n",
-    "                print('Cost at iteration {}: {}'.format(i, np.round(cost, 4)))\n",
+    "                print(f'Cost at iteration {i}: {np.round(cost, 4)}')\n",
     "\n",
     "            self.update_weights(gradients, eta)\n",
     "\n",
@@ -388,10 +383,10 @@
    "source": [
     "nn = NeuralNet(n_inputs=2, n_hidden=6, n_outputs=1)\n",
     "print(\"Shape of weight matrices and bias vectors:\")\n",
-    "print('W_h shape: ', nn.W_h.shape)\n",
-    "print('b_h shape: ', nn.b_h.shape)\n",
-    "print('W_o shape: ', nn.W_o.shape)\n",
-    "print('b_o shape: ', nn.b_o.shape)\n",
+    "print(f'W_h shape: {nn.W_h.shape}')\n",
+    "print(f'b_h shape: {nn.b_h.shape}')\n",
+    "print(f'W_o shape: {nn.W_o.shape}')\n",
+    "print(f'b_o shape: {nn.b_o.shape}')\n",
     "print()\n",
     "\n",
     "print(\"Training:\")\n",
@@ -421,7 +416,7 @@
    "source": [
     "n_test_samples, _ = X_test.shape\n",
     "y_predict = nn.predict(X_test)\n",
-    "print(\"Classification accuracy on test set: {}\".format(np.sum(y_predict == y_test)/n_test_samples))"
+    "print(f\"Classification accuracy on test set: {format(np.sum(y_predict == y_test)/n_test_samples)}\")"
    ]
   },
   {
@@ -492,6 +487,7 @@
   }
  ],
  "metadata": {
+  "anaconda-cloud": {},
   "kernelspec": {
    "display_name": "Python [conda root]",
    "language": "python",
@@ -508,6 +504,18 @@
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
    "version": "3.6.4"
+  },
+  "toc": {
+   "nav_menu": {},
+   "number_sections": true,
+   "sideBar": true,
+   "skip_h1_title": false,
+   "title_cell": "Table of Contents",
+   "title_sidebar": "Contents",
+   "toc_cell": false,
+   "toc_position": {},
+   "toc_section_display": true,
+   "toc_window_display": false
   }
  },
  "nbformat": 4,
diff --git a/softmax_regression.ipynb b/softmax_regression.ipynb