diff --git a/tensorflow_models.ipynb b/tensorflow_models.ipynb new file mode 100644 index 0000000..07eee62 --- /dev/null +++ b/tensorflow_models.ipynb @@ -0,0 +1,497 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Modifying and expanding the included TensorFlow modules" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "__author__ = \"Christopher Potts\"\n", + "__version__ = \"CS224u, Stanford, Spring 2018 term\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Contents\n", + "\n", + "0. [Overview](#Overview)\n", + "0. [Set-up](#Set-up)\n", + "0. [Basic experiments to illustrate the models](#Basic-experiments-to-illustrate-the-models)\n", + "0. [A basic softmax classifier](#A-basic-softmax-classifier)\n", + "0. [Softmax with a better optimizer](#Softmax-with-a-better-optimizer)\n", + "0. [Softmax with L2 regularization](#Softmax-with-L2-regularization)\n", + "0. [Shallow neural network with Dropout](#Shallow-neural-network-with-Dropout)\n", + "0. [A bidirectional RNN Classifier](#A-bidirectional-RNN-Classifier)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Overview\n", + "\n", + "This repository contains a small framework for defining models in TensorFlow. I hope the classes are easily extended. The goal of this notebook is to illustrate a few such extensions to try to convey the overall design. \n", + "\n", + "The class structure for the relevant files looks like this:\n", + "\n", + "* `tf_model_base.TfModelBase`\n", + " * `tf_shallow_neural_classifier.TfShallowNeuralClassifier`\n", + " * `tf_autoencoder.TfAutoencoder`\n", + " * `tf_rnn_classifier.TfRNNClassifier`\n", + " \n", + "To define a new subclass of `TfModelBase`, you need only fill in `build_graph`, `train_dict`, and `test_dict`. The first defines the model's core computation graph, and the other two tell the class how to handle incoming data during training and testing.\n", + "\n", + "Incidentally, the pure NumPy classes \n", + "\n", + "* `nn_model_base.NNModelBase`\n", + " * `rnn_classifier.RNNClassifier`\n", + " * `tree_nn.TreeNN` \n", + " \n", + "have a very similar design, and so they should be just as extendable. However, you have to write your own backpropagation methods for them, so they are more challenging in that respect." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set-up" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Applications/anaconda/envs/nlu/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.\n", + " from ._conv import register_converters as _register_converters\n" + ] + } + ], + "source": [ + "from sklearn.datasets import make_classification\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.metrics import classification_report\n", + "import tensorflow as tf\n", + "from tf_model_base import TfModelBase\n", + "from tf_rnn_classifier import TfRNNClassifier\n", + "from tf_shallow_neural_classifier import TfShallowNeuralClassifier" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Basic experiments to illustrate the models\n", + "\n", + "The following code is here just to facilitate testing. It's not part of the framework." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "def sklearn_evaluation(X, y, mod, random_state=None):\n", + " \"\"\"No frills random train/test split evaluations.\"\"\"\n", + " X_train, X_test, y_train, y_test = train_test_split(\n", + " X, y, test_size=0.33, random_state=random_state)\n", + " mod.fit(X_train, y_train, X_test=X_test, y_test=y_test)\n", + " predictions = mod.predict(X_test)\n", + " print(classification_report(y_test, predictions))\n", + " \n", + "\n", + "def artificial_evaluation(mod, random_state=42):\n", + " \"\"\"sklearn random classification dataset generation, \n", + " designed to be challenging.\"\"\"\n", + " X, y = make_classification(\n", + " n_samples=1000, \n", + " n_features=100, \n", + " n_informative=75, \n", + " n_redundant=5, \n", + " n_classes=3, \n", + " random_state=random_state)\n", + " sklearn_evaluation(\n", + " X, y, mod, \n", + " random_state=random_state)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## A basic softmax classifier\n", + "\n", + "A simple extension of `TfModelBase` is a softmax classifier: \n", + "\n", + "$$y = \\textbf{softmax}(xW + b)$$\n", + "\n", + "Really all we need to do is define the parameters and computation graph. \n", + "\n", + "__Note__: `self.model` has to be used to define the final output, because functions in `TfModelBase` assume this." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "class TfSoftmaxClassifier(TfModelBase):\n", + " def build_graph(self):\n", + " # Input and output placeholders\n", + " self.inputs = tf.placeholder(\n", + " tf.float32, shape=[None, self.input_dim])\n", + " self.outputs = tf.placeholder(\n", + " tf.float32, shape=[None, self.output_dim])\n", + " \n", + " # Parameters:\n", + " self.W = tf.Variable(\n", + " tf.zeros([self.input_dim, self.output_dim]))\n", + " self.b = tf.Variable(\n", + " tf.zeros([self.output_dim]))\n", + " \n", + " # The graph: \n", + " self.model = tf.matmul(self.inputs, self.W) + self.b\n", + " \n", + " def train_dict(self, X, y):\n", + " return {self.inputs: X, self.outputs: y}\n", + " \n", + " def test_dict(self, X):\n", + " return {self.inputs: X} " + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Iteration 100: loss: 0.5050749778747559" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + " precision recall f1-score support\n", + "\n", + " 0 0.71 0.73 0.72 107\n", + " 1 0.76 0.66 0.71 118\n", + " 2 0.61 0.68 0.64 105\n", + "\n", + "avg / total 0.69 0.69 0.69 330\n", + "\n" + ] + } + ], + "source": [ + "artificial_evaluation(TfSoftmaxClassifier(max_iter=100))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Softmax with a better optimizer\n", + "\n", + "In `TfModelBase`, the `get_optimizer` method returns a `tf.train.GradientDescentOptimizer`. To change this in `TfSoftmaxClassifier`, you can define a very small subclass.\n", + "\n", + "__Note__: `self.eta` and `self.cost` are set by the base class. The first is a keyword parameter, and the second is an attribute that gets set inside `fit`, as the return value of `get_cost_function`." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "class TfSoftmaxClassifierWithAdaGrad(TfSoftmaxClassifier):\n", + " \n", + " def get_optimizer(self):\n", + " return tf.train.AdagradOptimizer(self.eta).minimize(self.cost)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Iteration 100: loss: 0.4801039993762975" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + " precision recall f1-score support\n", + "\n", + " 0 0.70 0.74 0.72 107\n", + " 1 0.79 0.69 0.74 118\n", + " 2 0.63 0.68 0.65 105\n", + "\n", + "avg / total 0.71 0.70 0.70 330\n", + "\n" + ] + } + ], + "source": [ + "artificial_evaluation(TfSoftmaxClassifierWithAdaGrad(max_iter=100))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Softmax with L2 regularization\n", + "\n", + "It is very easy in TensorFlow to add L2 regularization to cost function. You really just write it down the way it appears in textbooks!" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "class TfSoftmaxClassifierL2(TfSoftmaxClassifier):\n", + " def __init__(self, C=1.0, **kwargs):\n", + " \"\"\"`C` is the inverse regularization strength.\"\"\"\n", + " self.C = 1.0 / C\n", + " super(TfSoftmaxClassifierL2, self).__init__(**kwargs)\n", + " \n", + " def get_cost_function(self, **kwargs):\n", + " reg = self.C * tf.nn.l2_loss(self.W)\n", + " cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(\n", + " logits=self.model, labels=self.outputs)\n", + " return tf.reduce_mean(reg + cross_entropy) " + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Iteration 100: loss: 0.5413660407066345" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + " precision recall f1-score support\n", + "\n", + " 0 0.71 0.72 0.71 107\n", + " 1 0.76 0.66 0.71 118\n", + " 2 0.61 0.69 0.65 105\n", + "\n", + "avg / total 0.69 0.69 0.69 330\n", + "\n" + ] + } + ], + "source": [ + "artificial_evaluation(TfSoftmaxClassifierL2(C=4, max_iter=100))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Shallow neural network with Dropout\n", + "\n", + "In this case, we extend `TfShallowNeuralClassifier` (a subclass of `TfModelBase`) with an additional __dropout later__.\n", + "\n", + "[Dropout](http://jmlr.org/papers/v15/srivastava14a.html) is another form of regularization for neural networks: during each pass, a random selection of dimensions of the target layer are masked, to try to encourage other dimensions to bear some of the weight, and to avoid correlations between dimensions that could lead to over-fitting. \n", + "\n", + "Here's [a funny tweet about dropout](https://twitter.com/Smerity/status/980175898119778304) that is surprisingly good at getting the point across." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "class TfShallowNeuralClassifierWithDropout(TfShallowNeuralClassifier):\n", + " def __init__(self, hidden_dim=50, keep_prob=0.8, **kwargs):\n", + " self.hidden_dim = hidden_dim\n", + " self.keep_prob = keep_prob\n", + " super(TfShallowNeuralClassifierWithDropout, self).__init__(**kwargs) \n", + " \n", + " def build_graph(self):\n", + " # All the parameters of `TfShallowNeuralClassifier`:\n", + " self.define_parameters()\n", + " \n", + " # Same hidden layer:\n", + " self.hidden = tf.nn.relu(\n", + " tf.matmul(self.inputs, self.W_xh) + self.b_h)\n", + " \n", + " # Drop-out on the hidden layer:\n", + " self.tf_keep_prob = tf.placeholder(tf.float32)\n", + " dropout_layer = tf.nn.dropout(self.hidden, self.tf_keep_prob)\n", + " \n", + " # `dropout_layer` instead of `hidden` to define full model:\n", + " self.model = tf.matmul(dropout_layer, self.W_hy) + self.b_y \n", + " \n", + " def train_dict(self, X, y):\n", + " return {self.inputs: X, self.outputs: y, \n", + " self.tf_keep_prob: self.keep_prob}\n", + " \n", + " def test_dict(self, X):\n", + " # No dropout at test-time, hence `self.tf_keep_prob: 1.0`:\n", + " return {self.inputs: X, self.tf_keep_prob: 1.0}" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Iteration 1000: loss: 0.16015665233135223" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + " precision recall f1-score support\n", + "\n", + " 0 0.77 0.74 0.75 107\n", + " 1 0.81 0.79 0.80 118\n", + " 2 0.69 0.73 0.71 105\n", + "\n", + "avg / total 0.76 0.75 0.76 330\n", + "\n" + ] + } + ], + "source": [ + "artificial_evaluation(TfShallowNeuralClassifierWithDropout(max_iter=1000))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## A bidirectional RNN Classifier\n", + "\n", + "As a final example, let's change `TfRNNClassifier` into a bidirectional model that makes its softmax prediction based on the concatenation of the two final states that it computes. Here, we just need to redefine `build_graph` (and its atually the same as the base class up to `self.cell`, where the two designs diverse)." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "class TfBidirectionalRNNClassifier(TfRNNClassifier):\n", + " \n", + " def build_graph(self):\n", + " self._define_embedding()\n", + "\n", + " self.inputs = tf.placeholder(\n", + " tf.int32, [None, self.max_length])\n", + "\n", + " self.ex_lengths = tf.placeholder(tf.int32, [None])\n", + "\n", + " # Outputs as usual:\n", + " self.outputs = tf.placeholder(\n", + " tf.float32, shape=[None, self.output_dim])\n", + "\n", + " # This converts the inputs to a list of lists of dense vector\n", + " # representations:\n", + " self.feats = tf.nn.embedding_lookup(\n", + " self.embedding, self.inputs)\n", + "\n", + " # Same cell structure as the base class, but we have\n", + " # forward and backward versions:\n", + " self.cell_fw = tf.nn.rnn_cell.LSTMCell(\n", + " self.hidden_dim, activation=self.hidden_activation)\n", + " \n", + " self.cell_bw = tf.nn.rnn_cell.LSTMCell(\n", + " self.hidden_dim, activation=self.hidden_activation)\n", + "\n", + " # Run the RNN:\n", + " outputs, finals = tf.nn.bidirectional_dynamic_rnn(\n", + " self.cell_fw,\n", + " self.cell_bw,\n", + " self.feats,\n", + " dtype=tf.float32,\n", + " sequence_length=self.ex_lengths)\n", + " \n", + " # finals is a pair of `LSTMStateTuple` objects, which are themselves\n", + " # pairs of Tensors (x, y), where y is the output state, according to\n", + " # https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/LSTMStateTuple\n", + " # Thus, we want the second member of these pairs:\n", + " last_fw, last_bw = finals \n", + " last_fw, last_bw = last_fw[1], last_bw[1]\n", + " \n", + " last = tf.concat((last_fw, last_bw), axis=1)\n", + " \n", + " self.feat_dim = self.hidden_dim * 2 \n", + "\n", + " # Softmax classifier on the final hidden state:\n", + " self.W_hy = self.weight_init(\n", + " self.feat_dim, self.output_dim, 'W_hy')\n", + " self.b_y = self.bias_init(self.output_dim, 'b_y')\n", + " self.model = tf.matmul(last, self.W_hy) + self.b_y " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.4" + }, + "widgets": { + "state": {}, + "version": "1.1.2" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}