Closed
Description
Hi,
I had two questions about the implementation of the dropPath:
- Why do we do it per sample, as far as I understand from https://arxiv.org/pdf/1603.09382.pdf, you either take the whole batch or drop it all together with probability p_l, why is it done per sample here?
- What is the _div(keep_prob) used for, I can't see that in the equation of the paper as well, can you please clarify the reason behind that?
Metadata
Metadata
Assignees
Labels
No labels