Skip to content

Commit

Permalink
Typo corrections; prose improvements in the SST notebooks
Browse files Browse the repository at this point in the history
  • Loading branch information
cgpotts committed Apr 5, 2018
1 parent e6b6ea1 commit 6cb45a6
Show file tree
Hide file tree
Showing 4 changed files with 44 additions and 49 deletions.
10 changes: 5 additions & 5 deletions sst_01_overview.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -138,9 +138,9 @@
"\n",
"* Make sure your environment includes all the requirements for [the cs224u repository](https://github.com/cgpotts/cs224u).\n",
"\n",
"* Download [the train/dev/test Stanford Sentiment Treebank distribution](http://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip), unzip it, and put the resulting folder in the same directory as this notebook. It will be called `trees`. (If you want to put it somewhere else, change sst_home below.)\n",
"* Download [the train/dev/test Stanford Sentiment Treebank distribution](http://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip), unzip it, and put the resulting folder in the same directory as this notebook. It will be called `trees`.\n",
"\n",
"* Make sure you still have the `vsmdata` directory and its contents. ([Here's a link in case you need to redownload it.](http://web.stanford.edu/class/cs224u/data/vsmdata.zip)) In addition, you might want the [the Wikipedia 2014 + Gigaword 5 distribution of the pretrained GloVe vectors](http://nlp.stanford.edu/data/glove.6B.zip). This might already be in `vsmdata`, depending on what kind of work you did as part of the VSM unit."
"* Make sure you still have the `vsmdata` directory and its contents. ([Here's a link in case you need to redownload it.](http://web.stanford.edu/class/cs224u/data/vsmdata.zip)) In addition, you might want [the Wikipedia 2014 + Gigaword 5 distribution of the pretrained GloVe vectors](http://nlp.stanford.edu/data/glove.6B.zip). This might already be in `vsmdata`, depending on what kind of work you did as part of the VSM unit."
]
},
{
Expand Down Expand Up @@ -169,7 +169,7 @@
" * `2` is a neutral label.\n",
" * `3` and `4` are positive labels. \n",
"\n",
"* Our readers are iteratorrs that yield `(tree, label)` pairs, where `tree` is an [NLTK Tree](http://www.nltk.org/_modules/nltk/tree.html) instance and `score` is a string."
"* Our readers are iterators that yield `(tree, label)` pairs, where `tree` is an [NLTK Tree](http://www.nltk.org/_modules/nltk/tree.html) instance and `score` is a string."
]
},
{
Expand Down Expand Up @@ -304,7 +304,7 @@
"\n",
"* We've deliberately ignored `test` readers. We urge you not to use the `test` set until and unless you are running experiments for a final project or similar. Overuse of test-sets corrupts them, since even subtle lessons learned from those runs can be incorporated back into model-building efforts.\n",
"\n",
"* We actually have mixed feelings about the overuse of `dev` that might result from working with these notebooks! We've tried to encourage using just splits of the training data for assessment most of the time, with only occasionally use of `dev`. This will give you a clearer picture of how you will ultimately do on test; over-use of `dev` can lead to over-fitting on that particular dataset with a resulting loss of performance of `test`."
"* We actually have mixed feelings about the overuse of `dev` that might result from working with these notebooks! We've tried to encourage using just splits of the training data for assessment most of the time, with only occasionally use of `dev`. This will give you a clearer picture of how you will ultimately do on `test`; over-use of `dev` can lead to over-fitting on that particular dataset with a resulting loss of performance of `test`."
]
},
{
Expand Down Expand Up @@ -336,7 +336,7 @@
"\n",
"A related note: the above shows that the __fine-grained sentiment task__ for the SST is particularly punishing as usually formulated, since it ignores the partial-order structure in the categories completely. As a result, mistaking `'0'` for `'1'` is as bad as mistaking `'0'` for `'4'`, though the first error is clearly less severe than the second.\n",
"\n",
"The functions `sst.binary_class_func` and `sst.ternary_class_func` will convert the labels for you. Let's now use them to study the label distributions."
"The functions `sst.binary_class_func` and `sst.ternary_class_func` will convert the labels for you, and recommended usage is to use them as the `class_func` keyword argument to `train_reader` and `dev_reader`; examples below."
]
},
{
Expand Down
16 changes: 7 additions & 9 deletions sst_02_hand_built_features.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
"source": [
"## Overview\n",
"\n",
"* The focus of this notebook is __building feature representations__ for use with (mostly linear) classifiers (though you're encouraged to try out some non-linear ones as well!)\n",
"* The focus of this notebook is __building feature representations__ for use with (mostly linear) classifiers (though you're encouraged to try out some non-linear ones as well!).\n",
"\n",
"* The core characteristics of the feature functions we'll build here:\n",
" * They represent examples in __very large, very sparse feature spaces__.\n",
Expand Down Expand Up @@ -329,7 +329,7 @@
"\n",
"* Initialize $\\mathbf{w} = \\mathbf{0}$\n",
"* Repeat $T$ times:\n",
" * for each $(x,y)$ in $\\mathcal{D}$ (in random order):\n",
" * for each $(x,y) \\in \\mathcal{D}$ (in random order):\n",
" * $\\tilde{y} = \\text{argmax}_{y'\\in \\mathcal{Y}} \\mathbf{Score}_{\\textbf{w}, \\phi}(x,y') + \\mathbf{cost}(y,y')$\n",
" * $\\mathbf{w} = \\mathbf{w} + \\eta(\\phi(x,y) - \\phi(x,\\tilde{y}))$\n",
" \n",
Expand Down Expand Up @@ -361,8 +361,7 @@
" Parameters\n",
" ----------\n",
" X : 2d np.array\n",
" The matrix of features, one example per row.\n",
" \n",
" The matrix of features, one example per row. \n",
" y : list\n",
" The list of labels for rows in `X`.\n",
" \n",
Expand Down Expand Up @@ -406,7 +405,6 @@
" ----------\n",
" X : 2d np.array\n",
" The matrix of features, one example per row.\n",
" \n",
" y : list\n",
" The list of labels for rows in `X`.\n",
" \n",
Expand Down Expand Up @@ -448,7 +446,7 @@
"source": [
"## Experiments\n",
"\n",
"We now have all the pieces needed to run experiments. And we're going to want to run a lot of experiments, trying out different feature functions, taking different perspectives on the data and labels, and using different models. \n",
"We now have all the pieces needed to run experiments. And __we're going to want to run a lot of experiments__, trying out different feature functions, taking different perspectives on the data and labels, and using different models. \n",
"\n",
"To make that process efficient and regimented, `sst` contains a function `experiment`. All it does is pull together these pieces and use them for training and assessment. It's complicated, but the flexibility will turn out to be an asset."
]
Expand Down Expand Up @@ -643,7 +641,7 @@
"y &= \\textbf{softmax}(xW_{xy} + b_{y})\n",
"\\end{align*}$$\n",
"\n",
"this model inserts a hidden layer with a non-linear activation applied to it:\n",
"the shallow neural network inserts a hidden layer with a non-linear activation applied to it:\n",
"\n",
"$$\\begin{align*}\n",
"h &= \\tanh(xW_{xh} + b_{h}) \\\\\n",
Expand Down Expand Up @@ -747,7 +745,7 @@
"source": [
"### Example using LogisticRegression\n",
"\n",
"Here's a fairly full-featured use of the above for the `LogisisticRegression` model family:"
"Here's a fairly full-featured use of the above for the `LogisticRegression` model family:"
]
},
{
Expand Down Expand Up @@ -839,7 +837,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The models written for this course are also compatible with this framework. They [\"duck type\"](https://en.wikipedia.org/wiki/Duck_typing) the sklearn models by having methods `fit`, `predict`, `get_params`, and `set_params`, and an attribute `params`."
"The models written for this course are also compatible with this framework. They [\"duck type\"](https://en.wikipedia.org/wiki/Duck_typing) the `sklearn` models by having methods `fit`, `predict`, `get_params`, and `set_params`, and an attribute `params`."
]
},
{
Expand Down
65 changes: 31 additions & 34 deletions sst_03_neural_networks.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,15 @@
"source": [
"## Contents\n",
"\n",
"0. [Overview of this unit](#Overview-of-this-unit)\n",
"0. [Overview](#Overview)\n",
"0. [Set-up](#Set-up)\n",
"0. [Distributed representations as features](#Distributed-representations-as-features)\n",
" 0. [GloVe inputs](#GloVe-inputs)\n",
" 0. [IMDB representations](#IMDB-representations)\n",
" 0. [Remarks on this approach](#Remarks-on-this-approach)\n",
"0. [RNN classifiers](#RNN-classifiers)\n",
" 0. [RNN dataset preparation](#RNN-dataset-preparation)\n",
" 0. [Vocabulary for embedding](#Vocabulary-for-embedding)\n",
" 0. [Vocabulary for the embedding](#Vocabulary-for-the-embedding)\n",
" 0. [Pure NumPy RNN implementation](#Pure-NumPy-RNN-implementation)\n",
" 0. [TensorFlow implementation](#TensorFlow-implementation)\n",
"0. [Tree-structured neural networks](#Tree-structured-neural-networks)\n",
Expand All @@ -44,7 +44,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Overview of this unit"
"## Overview\n",
"\n",
"This notebook defines and explores __recurrent neural network (RNN) classifiers__ and __tree-structured neural network (TreeNN) classifiers__ for the Stanford Sentiment Treebank. \n",
"\n",
"These approaches make their predictions based on comprehensive representations of the examples: \n",
"\n",
"* For the RNN, each word is modeled, as are its sequential relationships to the other words.\n",
"* For the TreeNN, the entire parsed structure of the sentence is modeled.\n",
"\n",
"Both models contrast with the ones explored in [the previous notebook](sst_02_hand_built_features.ipynb), which make predictions based on more partial, potentially idiosyncratic information extracted from the examples."
]
},
{
Expand Down Expand Up @@ -104,14 +113,9 @@
"source": [
"## Distributed representations as features\n",
"\n",
"As a first step in the direction of neural networks for sentiment, we can connect with our previous unit on distributed representations. Arguably, more than any specific model architecture, this is the major innovation of deep learning: __rather than designing feature functions by hand, we use dense, distributed representations, often derived from unsupervised models__."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Our model will just be `LogisticRegression`, and we'll continue with the experiment framework from the previous notebook. Here's is `fit_maxent_classifier` again:"
"As a first step in the direction of neural networks for sentiment, we can connect with our previous unit on distributed representations. Arguably, more than any specific model architecture, this is the major innovation of deep learning: __rather than designing feature functions by hand, we use dense, distributed representations, often derived from unsupervised models__.\n",
"\n",
"Our model will just be `LogisticRegression`, and we'll continue with the experiment framework from the previous notebook. Here is `fit_maxent_classifier` again:"
]
},
{
Expand Down Expand Up @@ -152,7 +156,7 @@
"outputs": [],
"source": [
"def vsm_leaves_phi(tree, lookup, np_func=np.sum):\n",
" \"\"\"Represent tree as a combination of the vector of its words.\n",
" \"\"\"Represent `tree` as a combination of the vector of its words.\n",
" \n",
" Parameters\n",
" ----------\n",
Expand All @@ -170,10 +174,10 @@
" -------\n",
" np.array, dimension `X.shape[1]`\n",
" \n",
" \"\"\"\n",
" dim = len(next(iter(lookup.values()))) \n",
" allvecs = np.array([lookup[w] for w in tree.leaves() if w in lookup])\n",
" \"\"\" \n",
" allvecs = np.array([lookup[w] for w in tree.leaves() if w in lookup]) \n",
" if len(allvecs) == 0:\n",
" dim = len(next(iter(lookup.values())))\n",
" feats = np.zeros(dim)\n",
" else: \n",
" feats = np_func(allvecs, axis=0) \n",
Expand Down Expand Up @@ -308,9 +312,9 @@
"source": [
"### Remarks on this approach\n",
"\n",
"* Recall that our `ungrams_phi`created feature representations with over 16K dimensions and got about 0.77.\n",
"* Recall that our `ungrams_phi` created feature representations with over 16K dimensions and got about 0.77.\n",
"\n",
"* The above models have only 50 dimensions and come close in terms of performance. In many ways, it's striking that we can get a model that is pretty competitive with so few dimensions.\n",
"* The above models have only 50 dimensions and come close in terms of performance. In many ways, it's striking that we can get a model that is competitive with so few dimensions.\n",
"\n",
"* The promise of the Mittens model of [Dingwall and Potts 2018](https://arxiv.org/abs/1803.09901) is that we can use GloVe itself to update the general purpose information in the 'glove.6B' vectors with specialized information from one of these IMDB count matrices. That might be worth trying; the `mittens` package already implements this!\n",
"\n",
Expand All @@ -323,7 +327,7 @@
"source": [
"## RNN classifiers\n",
"\n",
"A recurrent neural network (RNN) is any deep learning model that process its inputs sequentially. There are many variations on this theme. The one that we use here is a __RNN classifier__.\n",
"A recurrent neural network (RNN) is any deep learning model that process its inputs sequentially. There are many variations on this theme. The one that we use here is an __RNN classifier__.\n",
"\n",
"<img src=\"fig/rnn_classifier.png\" width=800 />\n",
"\n",
Expand All @@ -334,7 +338,7 @@
"y &= \\textbf{softmax}(h_{n}W_{hy} + b)\n",
"\\end{align*}$$\n",
"\n",
"where $1 \\leqslant t \\leqslant n$. As indicated in the above diagram, the sequence of hidden states is padded with an initial state $h_{0}$ In our implementations , this is always an all $0$ vector, but it can be initialized in more sophisticated ways (some of which we will explore in our unit on natural language inference).\n",
"where $1 \\leqslant t \\leqslant n$. As indicated in the above diagram, the sequence of hidden states is padded with an initial state $h_{0}$ In our implementations, this is always an all $0$ vector, but it can be initialized in more sophisticated ways (some of which we will explore in our unit on natural language inference).\n",
"\n",
"This is a potential gain over our sum-the-word-vectors baseline, in that it processes each word independently, and in the context of those that came before it. Thus, not only is this sensitive to word order, but the hidden representation give us the potential to encode how the preceding context for a word affects its interpretation.\n",
"\n",
Expand Down Expand Up @@ -363,7 +367,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Here, each member of `X_rnn_train` train is a list of lists of words. Here's a look at the start of the first:"
"Each member of `X_rnn_train` is a list of lists of words. Here's a look at the start of the first:"
]
},
{
Expand Down Expand Up @@ -433,7 +437,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Vocabulary for embedding\n",
"### Vocabulary for the embedding\n",
"\n",
"The first delicate issue we need to address is the vocabulary for our model:\n",
"\n",
Expand All @@ -445,14 +449,7 @@
"\n",
"* At the same time, we might want to collapse infrequent tokens into `$UNK` to make optimization easier.\n",
"\n",
"In `sst`, the function `get_vocab` implements these strategies:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can extract the training vocab and use it for the model embedding, secure in the knowledge that we will be able to process tokens outside of this set (by mapping them to `$UNK`)."
"In `sst`, the function `get_vocab` implements these strategies. Now we can extract the training vocab and use it for the model embedding, secure in the knowledge that we will be able to process tokens outside of this set (by mapping them to `$UNK`)."
]
},
{
Expand Down Expand Up @@ -485,7 +482,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This frankly seems too big. Let's restrict to just 3000 words:"
"This frankly seems too big relative to our dataset size. Let's restrict to just 3000 words:"
]
},
{
Expand All @@ -503,7 +500,7 @@
"source": [
"### Pure NumPy RNN implementation\n",
"\n",
"The first implementation we'll look at is a pure NumPy implementation of exactly the model depicted above. This implementation is a bit slow and might not be all that effective, but it is useful to have available in case one really wants to inspect the fine details of how these models process examples."
"The first implementation we'll look at is a pure NumPy implementation of exactly the model depicted above. This implementation is a bit slow and might not be all that effective, but it is useful to have available in case one really wants to inspect the details of how these models process examples."
]
},
{
Expand Down Expand Up @@ -578,7 +575,7 @@
"\n",
"The included TensorFlow implementation is much faster and more configurable. Its only downside is that it requires the user to specify a maximum length for all incoming sequences: \n",
"\n",
"* Examples that are shorter than this maximum are padded (and the implementation ignores those dimensions)\n",
"* Examples that are shorter than this maximum are padded (and the implementation ignores those dimensions).\n",
"* Examples that are longer than this maximum are clipped from the start (on the assumption that later words in the sentences will tend to be more informative).\n",
"\n",
"The function `utils.sequence_length_report` will help you make informed decisions:"
Expand Down Expand Up @@ -616,7 +613,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The class `TfRNNClassifier` takes a parameter for specifying this maximum length. It has many others as well:\n",
"The class `TfRNNClassifier` takes a parameter for specifying this maximum length. It has many other parameters as well:\n",
" \n",
"* `hidden_activation`: the activation function for the hidden layers (default: `tf.nn.tanh`).\n",
"* `cell_class`: which TensorFlow cell-type to use: \n",
Expand Down Expand Up @@ -1011,7 +1008,7 @@
"source": [
"### Pure NumPy TreeNN implementation\n",
"\n",
"`TreeNN` is a pure NumPy implementation of this model. It should be regarded as a baseline for models of this form. The original SST paper includes evaluations of a wide range of these models."
"`TreeNN` is a pure NumPy implementation of this model. It should be regarded as a baseline for models of this form. The original SST paper includes evaluations of a wide range of models in this family."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion vsm_01_distributional.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1383,7 +1383,7 @@
"\n",
"1. Run \n",
"\n",
" ```amod = pd.read_csv(os.path.join(data_home, 'gigawordnyt-advmod-matrix.csv.gzip'), index_col=0)``` \n",
" ```amod = pd.read_csv(os.path.join(data_home, 'gigawordnyt-advmod-matrix.csv.gz'), index_col=0)``` \n",
" \n",
" to read in an adjective $\\times$ adverb matrix derived from the Gigaword corpus. Each cell contains the number of times that the modifier phrase __ADV ADJ__ appeared in Gigaword as given by dependency parses of the data. __ADJ__ is the row value and __ADV__ is the column value. Using the above techniques and measures, try to get a feel for what can be done with this matrix.\n",
"\n",
Expand Down

0 comments on commit 6cb45a6

Please sign in to comment.