You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I don't understand why developers decided to include a swish non-linearity in the architecture right after the fully connected classifier layer, I don't think that it's good choice for an image classification task since swish will throw out some information that might be important, just like ReLU does with negative values. Most likely I'm wrong and there's some good reason to do this, can anyone please explain this to me?
Hi,
I don't understand why developers decided to include a swish non-linearity in the architecture right after the fully connected classifier layer, I don't think that it's good choice for an image classification task since swish will throw out some information that might be important, just like ReLU does with negative values. Most likely I'm wrong and there's some good reason to do this, can anyone please explain this to me?
This is how the network forward propagation ends:
(_fc): Linear(in_features=1280, out_features=1000, bias=True) (_swish): MemoryEfficientSwish()
Thanks
The text was updated successfully, but these errors were encountered: