Typo corrections; prose improvements in the SST notebooks

boraerden · Apr 5, 2018 · 6cb45a6 · 6cb45a6
1 parent e6b6ea1
commit 6cb45a6
Show file tree

Hide file tree

Showing 4 changed files with 44 additions and 49 deletions.
diff --git a/sst_01_overview.ipynb b/sst_01_overview.ipynb
@@ -138,9 +138,9 @@
     "\n",
     "*  Make sure your environment includes all the requirements for [the cs224u repository](https://github.com/cgpotts/cs224u).\n",
     "\n",
-    "* Download [the train/dev/test Stanford Sentiment Treebank distribution](http://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip), unzip it, and put the resulting folder in the same directory as this notebook. It will be called `trees`. (If you want to put it somewhere else, change sst_home below.)\n",
+    "* Download [the train/dev/test Stanford Sentiment Treebank distribution](http://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip), unzip it, and put the resulting folder in the same directory as this notebook. It will be called `trees`.\n",
     "\n",
-    "* Make sure you still have the `vsmdata` directory and its contents. ([Here's a link in case you need to redownload it.](http://web.stanford.edu/class/cs224u/data/vsmdata.zip)) In addition, you might want the [the Wikipedia 2014 + Gigaword 5 distribution of the pretrained GloVe vectors](http://nlp.stanford.edu/data/glove.6B.zip). This might already be in `vsmdata`, depending on what kind of work you did as part of the VSM unit."
+    "* Make sure you still have the `vsmdata` directory and its contents. ([Here's a link in case you need to redownload it.](http://web.stanford.edu/class/cs224u/data/vsmdata.zip)) In addition, you might want [the Wikipedia 2014 + Gigaword 5 distribution of the pretrained GloVe vectors](http://nlp.stanford.edu/data/glove.6B.zip). This might already be in `vsmdata`, depending on what kind of work you did as part of the VSM unit."
    ]
   },
   {
@@ -169,7 +169,7 @@
     "  * `2` is a neutral label.\n",
     "  * `3` and `4` are positive labels. \n",
     "\n",
-    "* Our readers are iteratorrs that yield `(tree, label)` pairs, where `tree` is an [NLTK Tree](http://www.nltk.org/_modules/nltk/tree.html) instance and `score` is a string."
+    "* Our readers are iterators that yield `(tree, label)` pairs, where `tree` is an [NLTK Tree](http://www.nltk.org/_modules/nltk/tree.html) instance and `score` is a string."
    ]
   },
   {
@@ -304,7 +304,7 @@
     "\n",
     "* We've deliberately ignored `test` readers. We urge you not to use the `test` set until and unless you are running experiments for a final project or similar. Overuse of test-sets corrupts them, since even subtle lessons learned from those runs can be incorporated back into model-building efforts.\n",
     "\n",
-    "* We actually have mixed feelings about the overuse of `dev` that might result from working with these notebooks! We've tried to encourage using just splits of the training data for assessment most of the time, with only occasionally use of `dev`. This will give you a clearer picture of how you will ultimately do on test; over-use of `dev` can lead to over-fitting on that particular dataset with a resulting loss of performance of `test`."
+    "* We actually have mixed feelings about the overuse of `dev` that might result from working with these notebooks! We've tried to encourage using just splits of the training data for assessment most of the time, with only occasionally use of `dev`. This will give you a clearer picture of how you will ultimately do on `test`; over-use of `dev` can lead to over-fitting on that particular dataset with a resulting loss of performance of `test`."
    ]
   },
   {
@@ -336,7 +336,7 @@
     "\n",
     "A related note: the above shows that the __fine-grained sentiment task__ for the SST is particularly punishing as usually formulated, since it ignores the partial-order structure in the categories completely. As a result, mistaking `'0'` for `'1'` is as bad as mistaking `'0'` for `'4'`, though the first error is clearly less severe than the second.\n",
     "\n",
-    "The functions `sst.binary_class_func` and `sst.ternary_class_func` will convert the labels for you. Let's now use them to study the label distributions."
+    "The functions `sst.binary_class_func` and `sst.ternary_class_func` will convert the labels for you, and recommended usage is to use them as the `class_func` keyword argument to `train_reader` and `dev_reader`; examples below."
    ]
   },
   {

diff --git a/sst_02_hand_built_features.ipynb b/sst_02_hand_built_features.ipynb
@@ -59,7 +59,7 @@
    "source": [
     "## Overview\n",
     "\n",
-    "* The focus of this notebook is __building feature representations__ for use with (mostly linear) classifiers (though you're encouraged to try out some non-linear ones as well!)\n",
+    "* The focus of this notebook is __building feature representations__ for use with (mostly linear) classifiers (though you're encouraged to try out some non-linear ones as well!).\n",
     "\n",
     "* The core characteristics of the feature functions we'll build here:\n",
     "   * They represent examples in __very large, very sparse feature spaces__.\n",
@@ -329,7 +329,7 @@
     "\n",
     "* Initialize $\\mathbf{w} = \\mathbf{0}$\n",
     "* Repeat $T$ times:\n",
-    "    * for each $(x,y)$ in $\\mathcal{D}$ (in random order):\n",
+    "    * for each $(x,y) \\in \\mathcal{D}$ (in random order):\n",
     "        * $\\tilde{y} = \\text{argmax}_{y'\\in \\mathcal{Y}} \\mathbf{Score}_{\\textbf{w}, \\phi}(x,y') + \\mathbf{cost}(y,y')$\n",
     "        * $\\mathbf{w} =  \\mathbf{w} + \\eta(\\phi(x,y) - \\phi(x,\\tilde{y}))$\n",
     "        \n",
@@ -361,8 +361,7 @@
     "    Parameters\n",
     "    ----------\n",
     "    X : 2d np.array\n",
-    "        The matrix of features, one example per row.\n",
-    "        \n",
+    "        The matrix of features, one example per row.        \n",
     "    y : list\n",
     "        The list of labels for rows in `X`.\n",
     "    \n",
@@ -406,7 +405,6 @@
     "    ----------\n",
     "    X : 2d np.array\n",
     "        The matrix of features, one example per row.\n",
-    "        \n",
     "    y : list\n",
     "        The list of labels for rows in `X`.\n",
     "    \n",
@@ -448,7 +446,7 @@
    "source": [
     "## Experiments\n",
     "\n",
-    "We now have all the pieces needed to run experiments. And we're going to want to run a lot of experiments, trying out different feature functions, taking different perspectives on the data and labels, and using different models. \n",
+    "We now have all the pieces needed to run experiments. And __we're going to want to run a lot of experiments__, trying out different feature functions, taking different perspectives on the data and labels, and using different models. \n",
     "\n",
     "To make that process efficient and regimented, `sst` contains a function `experiment`. All it does is pull together these pieces and use them for training and assessment. It's complicated, but the flexibility will turn out to be an asset."
    ]
@@ -643,7 +641,7 @@
     "y &= \\textbf{softmax}(xW_{xy} + b_{y})\n",
     "\\end{align*}$$\n",
     "\n",
-    "this model inserts a hidden layer with a non-linear activation applied to it:\n",
+    "the shallow neural network inserts a hidden layer with a non-linear activation applied to it:\n",
     "\n",
     "$$\\begin{align*}\n",
     "h &= \\tanh(xW_{xh} + b_{h}) \\\\\n",
@@ -747,7 +745,7 @@
    "source": [
     "### Example using LogisticRegression\n",
     "\n",
-    "Here's a fairly full-featured use of the above for the `LogisisticRegression` model family:"
+    "Here's a fairly full-featured use of the above for the `LogisticRegression` model family:"
    ]
   },
   {
@@ -839,7 +837,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The models written for this course are also compatible with this framework. They [\"duck type\"](https://en.wikipedia.org/wiki/Duck_typing) the sklearn models by having methods `fit`, `predict`, `get_params`, and `set_params`, and an attribute `params`."
+    "The models written for this course are also compatible with this framework. They [\"duck type\"](https://en.wikipedia.org/wiki/Duck_typing) the `sklearn` models by having methods `fit`, `predict`, `get_params`, and `set_params`, and an attribute `params`."
    ]
   },
   {

diff --git a/sst_03_neural_networks.ipynb b/sst_03_neural_networks.ipynb
@@ -23,15 +23,15 @@
    "source": [
     "## Contents\n",
     "\n",
-    "0. [Overview of this unit](#Overview-of-this-unit)\n",
+    "0. [Overview](#Overview)\n",
     "0. [Set-up](#Set-up)\n",
     "0. [Distributed representations as features](#Distributed-representations-as-features)\n",
     "  0. [GloVe inputs](#GloVe-inputs)\n",
     "  0. [IMDB representations](#IMDB-representations)\n",
     "  0. [Remarks on this approach](#Remarks-on-this-approach)\n",
     "0. [RNN classifiers](#RNN-classifiers)\n",
     "  0. [RNN dataset preparation](#RNN-dataset-preparation)\n",
-    "  0. [Vocabulary for embedding](#Vocabulary-for-embedding)\n",
+    "  0. [Vocabulary for the embedding](#Vocabulary-for-the-embedding)\n",
     "  0. [Pure NumPy RNN implementation](#Pure-NumPy-RNN-implementation)\n",
     "  0. [TensorFlow implementation](#TensorFlow-implementation)\n",
     "0. [Tree-structured neural networks](#Tree-structured-neural-networks)\n",
@@ -44,7 +44,16 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Overview of this unit"
+    "## Overview\n",
+    "\n",
+    "This notebook defines and explores __recurrent neural network (RNN) classifiers__ and __tree-structured neural network (TreeNN) classifiers__ for the Stanford Sentiment Treebank. \n",
+    "\n",
+    "These approaches make their predictions based on comprehensive representations of the examples: \n",
+    "\n",
+    "* For the RNN, each word is modeled, as are its sequential relationships to the other words.\n",
+    "* For the TreeNN, the entire parsed structure of the sentence is modeled.\n",
+    "\n",
+    "Both models contrast with the ones explored in [the previous notebook](sst_02_hand_built_features.ipynb), which make predictions based on more partial, potentially idiosyncratic information extracted from the examples."
    ]
   },
   {
@@ -104,14 +113,9 @@
    "source": [
     "## Distributed representations as features\n",
     "\n",
-    "As a first step in the direction of neural networks for sentiment, we can connect with our previous unit on distributed representations. Arguably, more than any specific model architecture, this is the major innovation of deep learning: __rather than designing feature functions by hand, we use dense, distributed representations, often derived from unsupervised models__."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Our model will just be `LogisticRegression`, and we'll continue with the experiment framework from the previous notebook. Here's is `fit_maxent_classifier` again:"
+    "As a first step in the direction of neural networks for sentiment, we can connect with our previous unit on distributed representations. Arguably, more than any specific model architecture, this is the major innovation of deep learning: __rather than designing feature functions by hand, we use dense, distributed representations, often derived from unsupervised models__.\n",
+    "\n",
+    "Our model will just be `LogisticRegression`, and we'll continue with the experiment framework from the previous notebook. Here is `fit_maxent_classifier` again:"
    ]
   },
   {
@@ -152,7 +156,7 @@
    "outputs": [],
    "source": [
     "def vsm_leaves_phi(tree, lookup, np_func=np.sum):\n",
-    "    \"\"\"Represent tree as a combination of the vector of its words.\n",
+    "    \"\"\"Represent `tree` as a combination of the vector of its words.\n",
     "    \n",
     "    Parameters\n",
     "    ----------\n",
@@ -170,10 +174,10 @@
     "    -------\n",
     "    np.array, dimension `X.shape[1]`\n",
     "            \n",
-    "    \"\"\"\n",
-    "    dim = len(next(iter(lookup.values())))    \n",
-    "    allvecs = np.array([lookup[w] for w in tree.leaves() if w in lookup])\n",
+    "    \"\"\"      \n",
+    "    allvecs = np.array([lookup[w] for w in tree.leaves() if w in lookup])    \n",
     "    if len(allvecs) == 0:\n",
+    "        dim = len(next(iter(lookup.values())))\n",
     "        feats = np.zeros(dim)\n",
     "    else:       \n",
     "        feats = np_func(allvecs, axis=0)      \n",
@@ -308,9 +312,9 @@
    "source": [
     "### Remarks on this approach\n",
     "\n",
-    "* Recall that our `ungrams_phi`created feature representations with over 16K dimensions and got about 0.77.\n",
+    "* Recall that our `ungrams_phi` created feature representations with over 16K dimensions and got about 0.77.\n",
     "\n",
-    "* The above models have only 50 dimensions and come close in terms of performance. In many ways, it's striking that we can get a model that is pretty competitive with so few dimensions.\n",
+    "* The above models have only 50 dimensions and come close in terms of performance. In many ways, it's striking that we can get a model that is competitive with so few dimensions.\n",
     "\n",
     "* The promise of the Mittens model of [Dingwall and Potts 2018](https://arxiv.org/abs/1803.09901) is that we can use GloVe itself to update the general purpose information in the 'glove.6B' vectors with specialized information from one of these IMDB count matrices. That might be worth trying; the `mittens` package already implements this!\n",
     "\n",
@@ -323,7 +327,7 @@
    "source": [
     "## RNN classifiers\n",
     "\n",
-    "A recurrent neural network (RNN) is any deep learning model that process its inputs sequentially. There are many variations on this theme. The one that we use here is a __RNN classifier__.\n",
+    "A recurrent neural network (RNN) is any deep learning model that process its inputs sequentially. There are many variations on this theme. The one that we use here is an __RNN classifier__.\n",
     "\n",
     "<img src=\"fig/rnn_classifier.png\" width=800 />\n",
     "\n",
@@ -334,7 +338,7 @@
     "y     &= \\textbf{softmax}(h_{n}W_{hy} + b)\n",
     "\\end{align*}$$\n",
     "\n",
-    "where $1 \\leqslant t \\leqslant n$. As indicated in the above diagram, the sequence of hidden states is padded with an initial state $h_{0}$ In our implementations , this is always an all $0$ vector, but it can be initialized in more sophisticated ways (some of which we will explore in our unit on natural language inference).\n",
+    "where $1 \\leqslant t \\leqslant n$. As indicated in the above diagram, the sequence of hidden states is padded with an initial state $h_{0}$ In our implementations, this is always an all $0$ vector, but it can be initialized in more sophisticated ways (some of which we will explore in our unit on natural language inference).\n",
     "\n",
     "This is a potential gain over our sum-the-word-vectors baseline, in that it processes each word independently, and in the context of those that came before it. Thus, not only is this sensitive to word order, but the hidden representation give us the potential to encode how the preceding context for a word affects its interpretation.\n",
     "\n",
@@ -363,7 +367,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Here, each member of `X_rnn_train` train is a list of lists of words. Here's a look at the start of the first:"
+    "Each member of `X_rnn_train` is a list of lists of words. Here's a look at the start of the first:"
    ]
   },
   {
@@ -433,7 +437,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Vocabulary for embedding\n",
+    "### Vocabulary for the embedding\n",
     "\n",
     "The first delicate issue we need to address is the vocabulary for our model:\n",
     "\n",
@@ -445,14 +449,7 @@
     "\n",
     "* At the same time, we might want to collapse infrequent tokens into `$UNK` to make optimization easier.\n",
     "\n",
-    "In `sst`, the function `get_vocab` implements these strategies:"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Now we can extract the training vocab and use it for the model embedding, secure in the knowledge that we will be able to process tokens outside of this set (by mapping them to `$UNK`)."
+    "In `sst`, the function `get_vocab` implements these strategies. Now we can extract the training vocab and use it for the model embedding, secure in the knowledge that we will be able to process tokens outside of this set (by mapping them to `$UNK`)."
    ]
   },
   {
@@ -485,7 +482,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This frankly seems too big. Let's restrict to just 3000 words:"
+    "This frankly seems too big relative to our dataset size. Let's restrict to just 3000 words:"
    ]
   },
   {
@@ -503,7 +500,7 @@
    "source": [
     "### Pure NumPy RNN implementation\n",
     "\n",
-    "The first implementation we'll look at is a pure NumPy implementation of exactly the model depicted above. This implementation is a bit slow and might not be all that effective, but it is useful to have available in case one really wants to inspect the fine details of how these models process examples."
+    "The first implementation we'll look at is a pure NumPy implementation of exactly the model depicted above. This implementation is a bit slow and might not be all that effective, but it is useful to have available in case one really wants to inspect the details of how these models process examples."
    ]
   },
   {
@@ -578,7 +575,7 @@
     "\n",
     "The included TensorFlow implementation is much faster and more configurable. Its only downside is that it requires the user to specify a maximum length for all incoming sequences: \n",
     "\n",
-    "* Examples that are shorter than this maximum are padded (and the implementation ignores those dimensions)\n",
+    "* Examples that are shorter than this maximum are padded (and the implementation ignores those dimensions).\n",
     "* Examples that are longer than this maximum are clipped from the start (on the assumption that later words in the sentences will tend to be more informative).\n",
     "\n",
     "The function `utils.sequence_length_report` will help you make informed decisions:"
@@ -616,7 +613,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The class `TfRNNClassifier` takes a parameter for specifying this maximum length. It has many others as well:\n",
+    "The class `TfRNNClassifier` takes a parameter for specifying this maximum length. It has many other parameters as well:\n",
     "    \n",
     "* `hidden_activation`: the activation function for the hidden layers (default: `tf.nn.tanh`).\n",
     "* `cell_class`: which TensorFlow cell-type to use: \n",
@@ -1011,7 +1008,7 @@
    "source": [
     "### Pure NumPy TreeNN implementation\n",
     "\n",
-    "`TreeNN` is a pure NumPy implementation of this model. It should be regarded as a baseline for models of this form. The original SST paper includes evaluations of a wide range of these models."
+    "`TreeNN` is a pure NumPy implementation of this model. It should be regarded as a baseline for models of this form. The original SST paper includes evaluations of a wide range of models in this family."
    ]
   },
   {

diff --git a/vsm_01_distributional.ipynb b/vsm_01_distributional.ipynb
@@ -1383,7 +1383,7 @@
     "\n",
     "1. Run \n",
     "\n",
-    "  ```amod = pd.read_csv(os.path.join(data_home, 'gigawordnyt-advmod-matrix.csv.gzip'), index_col=0)``` \n",
+    "  ```amod = pd.read_csv(os.path.join(data_home, 'gigawordnyt-advmod-matrix.csv.gz'), index_col=0)``` \n",
     "  \n",
     "  to read in an adjective $\\times$ adverb matrix derived from the Gigaword corpus. Each cell contains the number of times that the modifier phrase __ADV ADJ__ appeared in Gigaword as given by dependency parses of the data. __ADJ__ is the row value and __ADV__ is the column value. Using the above techniques and measures, try to get a feel for what can be done with this matrix.\n",
     "\n",