Extensive model and notebook updates

kbc-6723 · Jul 3, 2020 · f5fb5ac · f5fb5ac
1 parent b00a3d1
commit f5fb5ac
Show file tree

Hide file tree

Showing 67 changed files with 10,667 additions and 5,041 deletions.
diff --git a/README.md b/README.md
@@ -1,8 +1,95 @@
 # CS224u: Natural Language Understanding
 
-Code for [the Stanford course](http://web.stanford.edu/class/cs224u/). The code is written to run under Python 3.7; [setup.ipynb](setup.ipynb) provides  additional details.
+Code for [the Stanford course](http://web.stanford.edu/class/cs224u/).
+
+Fall 2020
 
 # Instructors
 
 * [Bill MacCartney](http://nlp.stanford.edu/~wcmac/)
 * [Christopher Potts](http://web.stanford.edu/~cgpotts/)
+
+
+# Core components
+
+
+## `setup.ipynb`
+
+Details on how to get set up to work with this code.
+
+
+## `tutorial_*` notebooks
+
+Introductions to Juypter notebooks, scientific computing with NumPy and friends, and PyTorch.
+
+
+## `torch_*.py` modules
+
+A generic optimization class (`torch_model_base.py`) and subclasses for GloVe, Autoencoders, shallow neural classifiers, RNN classifiers, tree-structured networks, and grounded natural language generation.
+
+`tutorial_pytorch_models.ipynb` shows how to use these modules as a general framework for creating original systems.
+
+
+## `np_*.py` modules
+
+Reference implementations for the `torch_*.py` models, designed to reveal more about how the optimization process works.
+
+
+## `vsm_*` and `hw_wordsim.ipynb`
+
+A until on vector space models of meaning, covering traditional methods like PMI and LSA as well as newer methods like Autoencoders and GloVe. `vsm.py` provides a lot of the core functionality, and `torch_glove.py` and `torch_autoencoder.py` are the learned models that we cover. `vsm_03_retroffiting.ipynb` is an extension that uses `retrofitting.py`.
+
+
+## `sst_*` and `hw_sst.ipynb`
+
+A unit on sentiment analysis with the [English Stanford Sentiment Treebank](https://nlp.stanford.edu/sentiment/treebank.html). The core code is `sst.py`, which includes a flexible experimental framework. All the PyTorch classifiers are put to use as well: `torch_shallow_neural_network.py`, `torch_rnn_classifier.py`, and `torch_tree_nn.py`.
+
+
+## `rel_ext*` and `hw_rel_ext.ipynb`
+
+A unit on relation extraction with distant supervision.
+
+
+## `nli_*` and `hw_wordentail.ipynb`
+
+A unit on Natural Language Inference. `nli.py` provides core interfaces to a variety of NLI dataset, and an experimental framework. All the PyTorch classifiers are again in heavy use: `torch_shallow_neural_network.py`, `torch_rnn_classifier.py`, and `torch_tree_nn.py`.
+
+
+## `colors*`, `torch_color_describer.py`, and `hw_colors.ipynb`
+
+A unit on grounded natural language generation, focused on generating context-dependent color descriptions using the [English Stanford Colors in Context dataset](https://cocolab.stanford.edu/datasets/colors.html).
+
+
+## `contextualreps.ipynb`
+
+Using pretrained parameters from [Hugging Face](https://huggingface.co) and [AllenNLP](https://allennlp.org) for featurization and fine-tuning.
+
+
+## `evaluation_*.ipynb` and `projects.md`
+
+Notebooks covering key experimental methods and practical considerations, and tips on writing up and presenting work in the field.
+
+
+## `utils.py`
+
+Miscellaneous core functions used throughout the code.
+
+
+## `test/`
+
+To run these tests, use
+
+```py.test -vv test/*```
+
+or, for just the tests in `test_shallow_neural_classifiers.py`,
+
+```py.test -vv test/test_shallow_neural_classifiers.py```
+
+If the above commands don't work, try
+
+```python3 -m pytest -vv test/test_shallow_neural_classifiers.py```
+
+
+## License
+
+The materials in this repo are licensed under the [Apache 2.0 license](LICENSE) and a [Creative Commons Attribution-ShareAlike 4.0 International license](http://creativecommons.org/licenses/by-sa/4.0/).
diff --git a/colors.py b/colors.py
@@ -5,25 +5,28 @@
 import matplotlib.patches as mpatch
 
 __author__ = "Christopher Potts"
-__version__ = "CS224u, Stanford, Spring 2020"
+__version__ = "CS224u, Stanford, Fall 2020"
 
 
 TURN_BOUNDARY =  " ### "
 
 
 class ColorsCorpusReader:
-    """Basic interface for the Stanford Colors in Context corpus:
+    """
+    Basic interface for the Stanford Colors in Context corpus:
 
     https://cocolab.stanford.edu/datasets/colors.html
 
     Parameters
     ----------
     src_filename : str
         Full path to the corpus file.
+
     word_count : int or None
         If int, then only examples with `word_count` words in their
         'contents' field are included (as estimated by the number of
         whitespqce tokens). If None, then all examples are returned.
+
     normalize_colors : bool
          The colors in the corpus are in HLS format with values
          [0, 360], [0, 100], [0, 100]. If `normalize_colors=True`,
@@ -43,7 +46,8 @@ def __init__(self, src_filename, word_count=None, normalize_colors=True):
         self.normalize_colors = normalize_colors
 
     def read(self):
-        """The main interface to the corpus.
+        """
+        The main interface to the corpus.
 
         As in the paper, turns taken in the same game and round are
         grouped together into a single `ColorsCorpusExample` instance
@@ -72,7 +76,8 @@ def _word_count_filter(self, row):
 
 
 class ColorsCorpusExample:
-    """Interface to individual examples in the Stanford Colors in
+    """
+    Interface to individual examples in the Stanford Colors in
     Context corpus.
 
     Parameters
@@ -81,6 +86,7 @@ class ColorsCorpusExample:
         This contains all of the turns associated with a given game
         and round. The assumption is that all of the key-value pairs
         in these dicts are the same except for the 'contents' key.
+
     normalize_colors : bool
          The colors in the corpus are in HLS format with values
          [0, 360], [0, 100], [0, 100]. If `normalize_colors=True`,
@@ -124,7 +130,8 @@ def __init__(self, rows, normalize_colors=True):
         self.speaker_context = self._get_reps_in_order('speaker')
 
     def parse_turns(self):
-        """"Turns the `contents` string into a list by splitting on
+        """"
+        Turns the `contents` string into a list by splitting on
         `TURN_BOUNDARY`.
 
         Returns
@@ -135,7 +142,8 @@ def parse_turns(self):
         return self.contents.split(TURN_BOUNDARY)
 
     def display(self, typ='model'):
-        """Prints examples to the screen in an intuitive format: the
+        """
+        Prints examples to the screen in an intuitive format: the
         utterance text appears first, following by the three color
         patches, with the target identified by a black border in the
         'speaker' and 'model' variants.
@@ -213,9 +221,10 @@ def _get_target_index(self, field):
 
     @staticmethod
     def _check_row_alignment(rows):
-        """We expect all the dicts in `rows` to have the same
-        keys and values except for the keys associated with the
-        messages. This function tests this assumption holds.
+        """
+        We expect all the dicts in `rows` to have the same keys and
+        values except for the keys associated with the messages. This
+        function tests this assumption holds.
 
         """
         keys = set(rows[0].keys())