diff --git a/hw_sentiment.ipynb b/hw_sentiment.ipynb
index a29fe0c..d6c852c 100644
--- a/hw_sentiment.ipynb
+++ b/hw_sentiment.ipynb
@@ -23,16 +23,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/cgpotts/cs224u/blob/master/hw_openqa.ipynb)\n",
+    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/cgpotts/cs224u/blob/main/hw_sentiment.ipynb)\n",
+    "[![Open in SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/cgpotts/cs224u/blob/main/hw_sentiment.ipynb)\n",
     "\n",
-    "If colab is opened with this badge, please **save a copy to drive** (from the 'File' menu) before running the notebook."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "[![Open in SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/cgpotts/cs224u/blob/master/hw_openqa.ipynb)"
+    "If Colab is opened with this badge, please **save a copy to drive** (from the File menu) before running the notebook."
    ]
   },
   {
@@ -53,7 +47,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The homework questions ask you to implement some baseline system using DynaSent Round 1, DynaSent Round 2, and the Stanford Sentiment Treebank. The bakeoff challenge is to define a system that does well on the DynaSent test sets, the SST-3 test set, and a set of mystery examples that don't correspond to the DynaSent or SST-3 domains."
+    "The homework questions ask you to implement some baseline systems using DynaSent Round 1, DynaSent Round 2, and the Stanford Sentiment Treebank. The bakeoff challenge is to define a system that does well on the DynaSent test sets, the SST-3 test set, and a set of mystery examples that don't correspond to the DynaSent or SST-3 domains."
    ]
   },
   {
@@ -104,7 +98,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": null,
    "metadata": {
     "id": "pyAzJmyYSNMP"
    },
@@ -302,7 +296,7 @@
    "outputs": [],
    "source": [
     "def print_label_dist(dataset, labelname='gold_label', splitnames=('train', 'validation')):\n",
-    "    for splitname in splitnames:   \n",
+    "    for splitname in splitnames:\n",
     "        print(splitname)\n",
     "        dist = sorted(Counter(dataset[splitname][labelname]).items())\n",
     "        for k, v in dist:\n",
@@ -339,7 +333,7 @@
     "id": "p4WFt0C6J8hP"
    },
    "source": [
-    "DynaSent Round 2 was created using different methods than Round 1. For Round 2, crowdworkers edited sentences from the Yelp Academic Dataset seeking to achieve a particular sentiment goal (e.g., expressing a positive sentiment) while fooling a top-performing model. This work was done on the [Dynabench](https://dynabench.org) platform. The hope is that this directly adversarial objective will lead to examples that are very hard for present-day models but intuitive for humans. All the examples were multiply-labeled by separate annotators."
+    "DynaSent Round 2 was created using different methods than Round 1. For Round 2, crowdworkers edited sentences from the Yelp Academic Dataset seeking to achieve a particular sentiment goal (e.g., expressing a positive sentiment) while fooling a top-performing model. This work was done on the [Dynabench](https://dynabench.org) platform. The hope is that this directly adversarial goal will lead to examples that are very hard for present-day models but intuitive for humans. All the examples were multiply-labeled by separate annotators."
    ]
   },
   {
@@ -434,7 +428,7 @@
     "id": "qeONNIJQJ8hP"
    },
    "source": [
-    "The [Stanford Sentiment Treebank (SST)](http://nlp.stanford.edu/sentiment/) of [Socher et al. 2013](https://aclanthology.org/D13-1170/) is a widely-used resource for evaluating supervised NLU models. It consists of sentences from Rotten Tomatoes Movie Reviews. We will use the ternary version of the task (SST-3)."
+    "The [Stanford Sentiment Treebank (SST)](http://nlp.stanford.edu/sentiment/) of [Socher et al. 2013](https://aclanthology.org/D13-1170/) is a widely-used resource for evaluating supervised models. It consists of sentences from Rotten Tomatoes Movie Reviews (see [Pang and Lee's project page](https://www.cs.cornell.edu/home/llee/papers/pang-lee-stars.home.html)). We will use the ternary version of the task (SST-3)."
    ]
   },
   {
@@ -696,19 +690,19 @@
    "outputs": [],
    "source": [
     "def unigrams_phi(s):\n",
-    "    \"\"\"The basis for a bigrams feature function. \n",
-    "    \n",
+    "    \"\"\"The basis for a unigrams feature function.\n",
+    "\n",
     "    Downcases all tokens.\n",
     "\n",
     "    Parameters\n",
     "    ----------\n",
-    "    text : str\n",
+    "    s : str\n",
     "        The example to represent\n",
     "\n",
     "    Returns\n",
     "    -------\n",
     "    Counter\n",
-    "        A map from tuples to their counts in `text`\n",
+    "        A map from tokens (str) to their counts in `text`\n",
     "\n",
     "    \"\"\"\n",
     "    return Counter(s.lower().split())"
@@ -727,7 +721,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "unigrams_phi(\"Here's an example with an emoticon :)\")"
+    "unigrams_phi(\"Here's an example with an emoticon :)!\")"
    ]
   },
   {
@@ -1113,7 +1107,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1122,13 +1116,13 @@
     "\n",
     "def tweetgrams_phi(s, **kwargs):\n",
     "    \"\"\"The basis for a feature function using `TweetTokenizer`.\n",
-    "    \n",
+    "\n",
     "    Parameters\n",
     "    ----------\n",
     "    s : str\n",
     "    kwargs : dict\n",
     "        Passed to `TweetTokenizer`\n",
-    "    \n",
+    "\n",
     "    Returns\n",
     "    -------\n",
     "    Counter\n",
@@ -1150,7 +1144,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1185,17 +1179,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "All tests passed for `tweetgrams_phi`\n"
-     ]
-    }
-   ],
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "test_tweetgrams_phi(tweetgrams_phi)"
    ]
@@ -1239,16 +1225,16 @@
    "source": [
     "def train_linear_model(model, featfunc, train_dataset):\n",
     "    \"\"\"Train an sklearn classifier.\n",
-    "    \n",
+    "\n",
     "    Parameters\n",
     "    ----------\n",
     "    model : sklearn classifier model\n",
     "    featfunc : func\n",
-    "        Maps strings to Counter instances.\n",
+    "        Maps strings to Counter instances\n",
     "    train_dataset: dict\n",
     "        Must have a key \"sentence\" containing strings that `featfunc` \n",
-    "        will process, and a key \"gold_label\" giving labels.\n",
-    "        \n",
+    "        will process, and a key \"gold_label\" giving labels\n",
+    "\n",
     "    Returns\n",
     "    -------\n",
     "    tuple\n",
@@ -1258,20 +1244,21 @@
     "    \"\"\"\n",
     "    pass\n",
     "    # Step 1: Featurize all the examples in `train_dataset['sentence']`\n",
-    "    ##### YOUR CODE HERE    \n",
+    "    ##### YOUR CODE HERE\n",
+    "\n",
+    "\n",
     "\n",
-    "    \n",
     "    # Step 2: Instantiate and use a `DictVectorizer`:\n",
     "    ##### YOUR CODE HERE\n",
     "\n",
     "\n",
-    "    \n",
+    "\n",
     "    # Step 3: Train the model on the feature matrix and\n",
     "    # train_dataset['gold_label']:\n",
     "    ##### YOUR CODE HERE\n",
     "\n",
     "\n",
-    "    \n",
+    "\n",
     "    # Step 4: Return (model, vectorizer):\n",
     "    ##### YOUR CODE HERE\n",
     "\n",
@@ -1300,7 +1287,7 @@
     "    model = LogisticRegression()\n",
     "    result = func(model, featfunc, train_dataset)\n",
     "    if not isinstance(result, tuple) or len(result) != 2:\n",
-    "        print(f\"Error for `{func.__name__}` incorrect return type\")\n",
+    "        print(f\"Error for `{func.__name__}`: Incorrect return type\")\n",
     "        return\n",
     "    model, vectorizer = result\n",
     "    if not hasattr(vectorizer, 'vocabulary_'):\n",
@@ -1310,7 +1297,7 @@
     "    if not hasattr(model, 'classes_'):\n",
     "        print(f\"Error for `{func.__name__}`: \"\n",
     "              f\"First return value is not a trained classifier\")\n",
-    "        return 1\n",
+    "        return\n",
     "    print(f\"No errors found for `{func.__name__}`\")"
    ]
   },
@@ -1327,7 +1314,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "You can now very easily train models on our datasets. Quick example (this shouldn't take more than a couple of minutes to run):"
+    "You can now very easily train models on our datasets. Quick example (this shouldn't take more than a couple of minutes to run even on a CPU):"
    ]
   },
   {
@@ -1367,37 +1354,38 @@
    "source": [
     "def assess_linear_model(model, featfunc, vectorizer, assess_dataset):\n",
     "    \"\"\"Assess a trained sklearn model.\n",
-    "    \n",
+    "\n",
     "    Parameters\n",
     "    ----------\n",
     "    model: trained sklearn model\n",
     "    featfunc : func\n",
-    "        Maps strings to count dicts.\n",
+    "        Maps strings to count dicts\n",
     "    vectorizer : fitted DictVectorizer\n",
     "    assess_dataset: dict\n",
     "        Must have a key \"sentence\" containing strings that `featfunc` \n",
-    "        will process, and a key \"gold_label\" giving labels.\n",
-    "    \n",
+    "        will process, and a key \"gold_label\" giving labels\n",
+    "\n",
     "    Returns\n",
     "    -------\n",
     "    A classification report (multiline string)\n",
-    "    \n",
+    "\n",
     "    \"\"\"\n",
     "    pass\n",
     "    # Step 1: Featurize the assessment data:\n",
     "    ##### YOUR CODE HERE\n",
     "\n",
-    "    \n",
+    "\n",
+    "\n",
     "    # Step 2: Vectorize the assessment data features:\n",
     "    ##### YOUR CODE HERE\n",
     "\n",
     "\n",
-    "    \n",
+    "\n",
     "    # Step 3: Make predictions:\n",
     "    ##### YOUR CODE HERE\n",
     "\n",
     "\n",
-    "    \n",
+    "\n",
     "    # Step 4: Return a classification report (str):\n",
     "    ##### YOUR CODE HERE\n",
     "\n",
@@ -1428,16 +1416,16 @@
     "        return Counter(s.split())\n",
     "    model = LogisticRegression()\n",
     "    model, vectorizer = trainfunc(model, featfunc, train_dataset)\n",
-    "    result = assessfunc(model, featfunc, vectorizer, assess_dataset)  \n",
+    "    result = assessfunc(model, featfunc, vectorizer, assess_dataset)\n",
     "    errcount = 0\n",
     "    if len(vectorizer.vocabulary_) != 2:\n",
-    "        print(\"Error for `{assessfunc.__name__}`: Unexpected feature count\")\n",
+    "        print(f\"Error for `{assessfunc.__name__}`: Unexpected feature count\")\n",
     "        errcount += 1\n",
     "    if 'weighted avg' not in result:\n",
-    "        print(\"Error for `{assessfunc.__name__}`: Unexpected return value\")\n",
+    "        print(f\"Error for `{assessfunc.__name__}`: Unexpected return value\")\n",
     "        errcount += 1\n",
     "    if errcount == 0:\n",
-    "        print(f\"No errors found for `{assessfunc.__name__}`\")    "
+    "        print(f\"No errors found for `{assessfunc.__name__}`\")"
    ]
   },
   {
@@ -1469,9 +1457,9 @@
    "outputs": [],
    "source": [
     "report = assess_linear_model(\n",
-    "    lr_unigrams, \n",
-    "    unigrams_phi, \n",
-    "    vec_unigrams, \n",
+    "    lr_unigrams,\n",
+    "    unigrams_phi,\n",
+    "    vec_unigrams,\n",
     "    dynasent_r1['validation'])\n",
     "\n",
     "print(report)"
@@ -1497,7 +1485,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We'll use BERT-mini for the homework so that we can rapdily develop prototypes. You can then consider scaling up to larger models."
+    "We'll use BERT-mini (originally from [the BERT repo](https://github.com/google-research/bert)) for the homework so that we can rapdily develop prototypes. You can then consider scaling up to larger models."
    ]
   },
   {
@@ -1803,7 +1791,7 @@
     "def get_batch_token_ids(batch, tokenizer):\n",
     "    \"\"\"Map `batch` to a tensor of ids. The return\n",
     "    value should meet the following specification:\n",
-    "    \n",
+    "\n",
     "    1. The max length should be 512.\n",
     "    2. Examples longer than the max length should be truncated\n",
     "    3. Examples should be padded to the max length for the batch.\n",
@@ -1811,20 +1799,21 @@
     "       token [SEP] should be added to the end.\n",
     "    5. The attention mask should be returned\n",
     "    6. The return value of each component should be a tensor.    \n",
-    "    \n",
+    "\n",
     "    Parameters\n",
     "    ----------\n",
     "    batch: list of str\n",
     "    tokenizer: Hugging Face tokenizer\n",
-    "    \n",
+    "\n",
     "    Returns\n",
     "    -------\n",
     "    dict with at least \"input_ids\" and \"attention_mask\" as keys,\n",
     "    each with Tensor values\n",
-    "        \n",
+    "\n",
     "    \"\"\"\n",
     "    pass\n",
-    "    ##### YOUR CODE HERE    \n",
+    "    ##### YOUR CODE HERE\n",
+    "\n",
     "\n"
    ]
   },
@@ -1843,7 +1832,7 @@
    "source": [
     "def test_get_batch_token_ids(func):\n",
     "    examples = [\n",
-    "        \"Bert knows Snuffleupagus\", \n",
+    "        \"Bert knows Snuffleupagus\",\n",
     "        \"ELMo knew Bert.\",\n",
     "        \"Buffalo \" * 520\n",
     "    ]\n",
@@ -1868,7 +1857,7 @@
     "        print(f\"Error for `{func.__name__}`: \"\n",
     "              f\"Special tokens were not added\")\n",
     "    if errcount == 0:\n",
-    "        print(f\"No errors found for `{func.__name__}`\")    "
+    "        print(f\"No errors found for `{func.__name__}`\")"
    ]
   },
   {
@@ -1908,40 +1897,41 @@
    "outputs": [],
    "source": [
     "def get_reps(dataset, model, tokenizer, batchsize=20):\n",
-    "    \"\"\"Represent each example in `dataset` with the \n",
-    "    final hidden state above the [CLS] token.\n",
-    "    \n",
+    "    \"\"\"Represent each example in `dataset` with the final hidden state \n",
+    "    above the [CLS] token.\n",
+    "\n",
     "    Parameters\n",
     "    ----------\n",
     "    dataset : list of str\n",
     "    model : BertModel\n",
     "    tokenizer : BertTokenizerFast\n",
     "    batchsize : int\n",
-    "    \n",
+    "\n",
     "    Returns\n",
     "    -------\n",
-    "    torch.Tensor with shape `(n_examples, dim)` where `dim` is the \n",
-    "    dimensionality of the representations for `model`.                \n",
-    "    \n",
-    "    \"\"\"    \n",
+    "    torch.Tensor with shape `(n_examples, dim)` where `dim` is the\n",
+    "    dimensionality of the representations for `model`\n",
+    "\n",
+    "    \"\"\"\n",
     "    data = []\n",
     "    with torch.no_grad():\n",
+    "        pass\n",
     "        # Iterate over `dataset` in batches:\n",
     "        ##### YOUR CODE HERE\n",
-    "        pass\n",
     "\n",
-    "            \n",
+    "\n",
+    "\n",
     "            # Encode the batch with `get_batch_token_ids`:\n",
     "            ##### YOUR CODE HERE\n",
     "\n",
     "\n",
-    "            \n",
+    "\n",
     "            # Get the representations from the model, making\n",
     "            # sure to pay attention to masking:\n",
     "            ##### YOUR CODE HERE\n",
     "\n",
     "\n",
-    "        \n",
+    "\n",
     "        # Return a single tensor:\n",
     "        ##### YOUR CODE HERE\n",
     "\n",
@@ -1974,7 +1964,7 @@
     "    if round(result[0][0].item(), 2) != -0.64:\n",
     "        print(f\"Error for `{func.__name__}`: \"\n",
     "              f\"Representations seem to be incorrect\")\n",
-    "    print(f\"No errors found for `{func.__name__}`\")    "
+    "    print(f\"No errors found for `{func.__name__}`\")"
    ]
   },
   {
@@ -2026,7 +2016,7 @@
     "        layer on top of that as the final output. The output of\n",
     "        the dense layer should have the same dimensionality as the\n",
     "        model input.\n",
-    "        \n",
+    "\n",
     "        Parameters\n",
     "        ----------\n",
     "        n_classes : int\n",
@@ -2036,7 +2026,7 @@
     "        weights_name : str\n",
     "            Name of pretrained model to load from Hugging Face\n",
     "\n",
-    "        \"\"\"        \n",
+    "        \"\"\"\n",
     "        super().__init__()\n",
     "        self.n_classes = n_classes\n",
     "        self.weights_name = weights_name\n",
@@ -2056,34 +2046,34 @@
     "        # and we rely on the PyTorch loss function to add apply a\n",
     "        # softmax to y.  \n",
     "        self.classifier_layer = None\n",
-    "        ##### YOUR CODE HERE        \n",
+    "        ##### YOUR CODE HERE\n",
+    "\n",
     "\n",
     "\n",
-    "        \n",
     "    def forward(self, indices, mask):\n",
     "        \"\"\"Process `indices` with `mask` by feeding these arguments\n",
     "        to `self.bert` and then feeding the initial hidden state\n",
     "        in `last_hidden_state` to `self.classifier_layer`.\n",
-    "        \n",
+    "\n",
     "        Parameters\n",
     "        ----------\n",
     "        indices : tensor.LongTensor of shape (n_batch, k)\n",
-    "            Indices into the `self.bert` embedding layer. `n_batch` is \n",
-    "            the number of examples and `k` is the sequence length for \n",
+    "            Indices into the `self.bert` embedding layer. `n_batch` is\n",
+    "            the number of examples and `k` is the sequence length for\n",
     "            this batch\n",
     "        mask : tensor.LongTensor of shape (n_batch, d)\n",
-    "            Binary vector indicating which values should be masked. \n",
-    "            `n_batch` is the number of examples and `k` is the \n",
+    "            Binary vector indicating which values should be masked.\n",
+    "            `n_batch` is the number of examples and `k` is the\n",
     "            sequence length for this batch\n",
-    "        \n",
+    "\n",
     "        Returns\n",
     "        -------\n",
     "        tensor.FloatTensor\n",
     "            Predicted values, shape `(n_batch, self.n_classes)`\n",
-    "        \n",
+    "\n",
     "        \"\"\"\n",
     "        pass\n",
-    "        ##### YOUR CODE HERE  \n",
+    "        ##### YOUR CODE HERE\n",
     "\n",
     "\n"
    ]
@@ -2104,7 +2094,7 @@
    "outputs": [],
    "source": [
     "ids = get_batch_token_ids(\n",
-    "    dynasent_r1['train']['sentence'][: 2], \n",
+    "    dynasent_r1['train']['sentence'][: 2],\n",
     "    bert_tokenizer)\n",
     "\n",
     "bert_module(ids['input_ids'], ids['attention_mask'])"
@@ -2122,12 +2112,12 @@
     "    expected_activation = nn.ReLU()\n",
     "    mod = moduleclass(expected_out, expected_activation)\n",
     "    errcount = 0\n",
-    "    \n",
+    "\n",
     "    # Basic layer structure:\n",
     "    if not hasattr(mod, \"classifier_layer\") or mod.classifier_layer is None:\n",
     "        errcount += 1\n",
     "        print(f\"Error for `{moduleclass.__name__}`: \"\n",
-    "              f\"Missing attribute `classifier_layer`\") \n",
+    "              f\"Missing attribute `classifier_layer`\")\n",
     "        return \n",
     "    for i in range(3):\n",
     "        try:\n",
@@ -2136,7 +2126,7 @@
     "            errcount += 1\n",
     "            print(f\"Error for `{moduleclass.__name__}`: \"\n",
     "                  f\"`classifier_layer` is not an `nn.Sequential` \"\n",
-    "                  f\"and/or does not have the right structure\")  \n",
+    "                  f\"and/or does not have the right structure\")\n",
     "    # Correct first layer dimensionality:\n",
     "    result_hidden = mod.classifier_layer[0].out_features\n",
     "    if result_hidden != expected_hidden:\n",
@@ -2211,7 +2201,7 @@
     "    def build_graph(self):\n",
     "        return BertClassifierModule(\n",
     "            self.n_classes_, self.hidden_activation, self.weights_name)\n",
-    "        \n",
+    "\n",
     "    def build_dataset(self, X, y=None):\n",
     "        data = get_batch_token_ids(X, self.tokenizer)\n",
     "        if y is None:\n",
@@ -2270,7 +2260,7 @@
     "%%time\n",
     "\n",
     "_ = bert_finetune.fit(\n",
-    "    dynasent_r1['train']['sentence'], \n",
+    "    dynasent_r1['train']['sentence'],\n",
     "    dynasent_r1['train']['gold_label'])"
    ]
   },
@@ -2390,7 +2380,9 @@
    "source": [
     "The bakeoff dataset is available at \n",
     "\n",
-    "https://web.stanford.edu/class/cs224u/data/cs224u-sentiment-test-unlabeled.csv"
+    "https://web.stanford.edu/class/cs224u/data/cs224u-sentiment-test-unlabeled.csv\n",
+    "\n",
+    "This code should grab it for you and put it in `data/sentiment` if you are working in the cloud:"
    ]
   },
   {