Why the non-linearity in the last layer? #245

marcelodiaz558 · 2020-11-08T05:09:56Z

Hi,
I don't understand why developers decided to include a swish non-linearity in the architecture right after the fully connected classifier layer, I don't think that it's good choice for an image classification task since swish will throw out some information that might be important, just like ReLU does with negative values. Most likely I'm wrong and there's some good reason to do this, can anyone please explain this to me?

This is how the network forward propagation ends:

(_fc): Linear(in_features=1280, out_features=1000, bias=True) (_swish): MemoryEfficientSwish()

Thanks

The text was updated successfully, but these errors were encountered:

marcelodiaz558 closed this as completed Nov 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why the non-linearity in the last layer? #245

Why the non-linearity in the last layer? #245

marcelodiaz558 commented Nov 8, 2020

Why the non-linearity in the last layer? #245

Why the non-linearity in the last layer? #245

Comments

marcelodiaz558 commented Nov 8, 2020