Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem about the dense layer #13

Open
mammadjv opened this issue Mar 27, 2019 · 1 comment
Open

Problem about the dense layer #13

mammadjv opened this issue Mar 27, 2019 · 1 comment

Comments

@mammadjv
Copy link

I had a problem with the approach which is clear in this line:

model.add(Dense(2, activation = 'softmax', init='uniform'))

As it is mentioned in the paper, trained weights in this layer are used for a weighted sum over the last produced activation maps.
For predicting a non-linear function and class score in an MLP, there should be at least two layers (one hidden layer and an output layer like Softmax).
But here, right after the GAP layer, only one FC layer with two units is added for classification.
Can anyone explain the reason?
And why the number of units is 2?

@AngusMaiden
Copy link

AngusMaiden commented Jan 17, 2023

@mammadjv,

This is now a quite old comment but I'll answer it anyway and hope it helps you or someone else reading.

The Keras Dense layer takes an activation parameter which is a shortcut for adding another activation layer. The implementation above is essentially two layers, a hidden layer and a Softmax output layer. Keras provides this convenience because you rarely need to change any of the attributes of the activation layer, except for what type of activation it is. In practicality, we don't normally think of the activation layer as a separate layer, it is often considered part of the layer before it, thus this Keras Dense layer is the output layer, with a softmax activation. Keras allows for a number of different activations such as 'relu', 'tanh', 'sigmoid', 'softmax', etc. in the activation argument of the Dense layer, so when implementing the final layer you can simply choose softmax as the activation function and you have your output layer.

The reason why the number of outputs is 2, is because the model is classifying two classes: person and non-person as described in the README. This architecture is used with a one-hot encoding type of labelling, i.e. the label {1,0} = non-person, and label {0,1} = person. The size of the label vector is 2 which matches the number of output nodes. Note that the developer could have instead used 1 output node with a sigmoid activation on the final layer to achieve the same thing, whereby outputs in the range [0,0.5) = non-person, and outputs in the range [0.5,1] = person. Here the corresponding labels would have to be a binary scalar i.e. 0 = non-person, 1 = person. However, the math all works out the same so this is is more or less an arbitrary choice.

I'm not sure from your question if you were wondering about this as well, but in addition, there is no need for any other hidden layers after the GAP as GAP can be used to replace any FC layers after the convolution layers, instead connecting straight from the GAP to the output layer with softmax activation. See this page for more details: https://paperswithcode.com/method/global-average-pooling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants