Skip to content

Commit cf1a9ad

Browse files
authored
Update README.md
1 parent 068fd0a commit cf1a9ad

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,17 @@ Our Transformer model makes some modifications to the original model, shown in F
5858

5959
Fig. 1: Original Transformer Network in "Attention is All You Need" by Vaswani et al (2017)
6060

61+
_Author's Note: An experimental feature to learn a weighted representation of the encoder embeddings to add to the decoder before it begins decoding was used in this assignment.
62+
```
63+
enc_dec_scores = tf.nn.relu(tf.transpose(tf.tensordot(
64+
x_enc_token, self.p_enc_decode, [[2], [0]]), [0, 2, 1])) + eps
65+
enc_dec_alphas = tf.divide(
66+
enc_dec_scores, tf.reduce_sum(
67+
enc_dec_scores, axis=[-1], keepdims=True))
68+
x_enc_decoder = tf.matmul(enc_dec_alphas, x_enc_token)
69+
```
70+
So far, this feature was observed to occasionally help in the decoding, but there is no effect at other times. In most cases, the model's performance does not appear to degrade too much with the inclusion of this feature apart from making the model larger than necessary. In this assignment, we had erroneously used the wrong version which included this experimental feature but was unfortunately not able to re-train the model due to time constraints._
71+
6172
Before sending the data into the Transformer model, the dialogue sequences need to be converted into their corresponding integer labels. This is done via
6273
```
6374
tmp_i_tok = data_tuple[tmp_index][0].split(" ")

0 commit comments

Comments
 (0)