You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Mar 12, 2024. It is now read-only.
Copy file name to clipboardExpand all lines: labs-exercises/neural-encoding-fever.md
+21-5
Original file line number
Diff line number
Diff line change
@@ -131,13 +131,29 @@ If you are using `pdb`, you will have to write a simple 2-line wrapper script: s
131
131
## Exercises
132
132
For the exercises, we have provided a dataset reader (`athnlp/readers/fever_reader.py`), configuration file (`athnlp/experiments/fever.json`), and sample model (`athnlp/models/fever_text_classification.py`). You can complete these exercises by completing the code in the sample model.
133
133
134
+
### 1. Average Word Embedding Model
134
135
1. Implement a model that
135
136
- represents the claim and the evidence by averaging their word embeddings;
136
137
- concatenates the two representations;
137
138
- uses a multilayer perceptron to decide the label.
138
-
Experiment with the number and the size of hidden layers to find the best settings using the train/dev set and assess your accuracy on the test set.
139
-
2. Look at the distribution of training data. How does balancing the number of `SUPPORTED` and `REFUTED` training instances affect the model accuracy? (hint, you may have to create a new dataset reader)
140
-
3. Compare against a discrete feature baseline, i.e., using one-hot vectors or hand-crafted features instead of word embeddings to represent the words?
141
-
4. Implement a _[hypothesis only](https://www.aclweb.org/anthology/S18-2023)_ version of the model that ignores the evidence and only uses the claim for predicting the label. What accuracy does this model get? Why do you think this?
142
-
5. Take a look at the training/dev data. Can you design claims that would "fool" your models? You can see this report ([Thorne and Vlachos, 2019](https://arxiv.org/abs/1903.05543)) for inspiration.
139
+
140
+
2. Experiment with the number and the size of hidden layers to find the best settings using the train/dev set and assess your accuracy on the test set. (note: this model may not get high accuracy)
141
+
142
+
3. Explore: How does fine-tuning the word embeddings affect performance? You can make the word embeddings layer trainable by changing the config file for the `text_field_embedder` in the `fever.json` config file.
143
+
144
+
### 2. Discrete Feature Baseline
145
+
1. Compare against a discrete feature baseline, i.e., using one-hot vectors or hand-crafted features instead of word embeddings to represent the words?
146
+
147
+
### 3. Alternative Pooling Methods
148
+
Averaging word embeddings is an example of Pooling (see slide 110/111 in Ryan McDonald's talk: [SLIDES](https://github.com/athnlp/athnlp-labs/blob/master/slides/McDonald_classification.pdf)).
149
+
150
+
Try alternative methods for pooling the word embeddings. Which ones make an improvement?
151
+
152
+
1. Replace the averaging of word embeddings with max pooling (taking the max values for each embedding dimension over each word in the sentence).
153
+
154
+
2. Use a `CnnEncoder()` to generate sentence representations. (hint: you may need to set `"token_min_padding_length": 5` or higher in the `tokens` object in `token_indexers` for large filter sizes). Filter sizes of between 2-5 should be sufficient. More filters will cause training to be slower (perhaps just train for 1 or 2 epochs)
155
+
156
+
### 4. Hypothesis-Only NLI and Biases
157
+
1. Implement a _[hypothesis only](https://www.aclweb.org/anthology/S18-2023)_ version of the model that ignores the evidence and only uses the claim for predicting the label. What accuracy does this model get? Why do you think this? Think back to slide 7 on Ryan's talk.
158
+
2. Take a look at the training/dev data. Can you design claims that would "fool" your models? You can see this report ([Thorne and Vlachos, 2019](https://arxiv.org/abs/1903.05543)) for inspiration.
143
159
What do you conclude about the ability of your model to understand language?
Copy file name to clipboardExpand all lines: labs-exercises/pos-tagging-perceptron.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -31,7 +31,7 @@ nltk.download('brown')
31
31
#### 1. Perceptron Algorithm
32
32
33
33
Implement the standard perceptron algorithm. Use the first 10000/1000/1000 sentences for training/dev/test.
34
-
In order to speed up the process for you, we have implemented a simple dataset reader that automatically converts the Brown corpus using the Universal PoS Tagset: `athnlp/reader/brown_pos_corpus.py` (you may use your own implementation if you want; `athnlp/reader/en-brown.map` provides the mapping from Brown to Universal Tagset).
34
+
In order to speed up the process for you, we have implemented a simple dataset reader that automatically converts the Brown corpus using the Universal PoS Tagset: `athnlp/readers/brown_pos_corpus.py` (you may use your own implementation if you want; `athnlp/reader/en-brown.map` provides the mapping from Brown to Universal Tagset).
35
35
36
36
**Important**: Recall that the perceptron has to predict multiple (PoS tags) instead of binary classes:
0 commit comments