Open
Description
22 Nov. 2017. Has completed the first draft. I've tested the current hyperparameters on only Nick dataset which is 8 hours long, but not on LJ which is 24 hours long. The results were not good, not terrible. As I tried with the same hyperparameters as the original paper with no success, I changed some of them. Amongst them are application of dilation and positional embedding instead of positional encoding. I found the attention plot of the last layer looks monotonic somewhat, but not clearly. I think the key signal that the network works is, of course, the attention plots.
Metadata
Metadata
Assignees
Labels
No labels