-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some code that can be consolidated, but I didn't see any red flags. Accepting to unblock and I can do a cleanup/consolidation pass later.
dropout_frac = opt.get('dropout', 0.0) | ||
self.dropout = nn.Dropout(p=dropout_frac) # --dropout | ||
|
||
self.n_positions = _default(n_positions, get_n_positions_from_options(opt)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like many of these properties are the same as TransformerDecoder
and you already call super().init()
. Perhaps you only need the layer building code here.
I can add a build_layers
function to TransformerDecoder
in a subsequent PR for overriding purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i'll look at these later after this initial merge, thanks
self.ffn = HashLayerFFN( | ||
opt, | ||
embedding_size, | ||
ffn_size, | ||
relu_dropout=relu_dropout, | ||
activation=activation, | ||
) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here; it looks like we're only overloading HashLayerFFN
? If so, you can just create your TransformerDecoderLayer
like this without overriding:
TransformerDecoderLayer.with_components(feedforward=HashLayerFFN)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i'll look at these later after this initial merge, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(but i think i can't do that because i need to pass through the original inputs in the decoder layer)
No description provided.