You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, classification is done by just using the sigmoid activation function, which is basically just regression. This could potentially lead to predicted probabilities being outside of [0, 1]. Instead, for classification we should use a normal ELM with ReLU or another activation to get raw predictions and apply the sigmoid to those outputs similar to the way we use a softmax layer for multiclass classification.
The text was updated successfully, but these errors were encountered:
Possible options for getting predicted "probabilities" are:
1. Do nothing
Since the ELM is minimizing the MSE, most of the time this would keep the predictions in [0, 1] but there could be times
when the predictions fall outside this range, as in a linear probability model.
2. Apply the sigmoid function to the predictions
This would always constrain the outputs to [0, 1]. However, this would often be problematic because the output from the
ELM is already in one class or another and applying the sigmoid function would change the predicted class. For example, if
the threshold is the usual 0.5 and the ELM output a prediction of 0.4, which minimizes the MSE, then applying the sigmoid
function would give an output of 0.598687660112452, which would be predicting a different class.
3. Use a clipping function.
This would ensure that all the predictions would be in [0, 1] and it would also not change predicted classes. However, it
would not technically output a probability, like the sigmoid function would. Also, we would have to choose some range like
[1e-5, 1-1e-5] since no observation will have a probability of exactly 0 or 1.
Overall, using a clipping function is probably the best option because it does not change the optimization problem that the ELM is solving or change the predicted classes, keeps the prediction in [0, 1], and is probably doesn't make much of a difference for predictions outside [0, 1]
Currently, classification is done by just using the sigmoid activation function, which is basically just regression. This could potentially lead to predicted probabilities being outside of [0, 1]. Instead, for classification we should use a normal ELM with ReLU or another activation to get raw predictions and apply the sigmoid to those outputs similar to the way we use a softmax layer for multiclass classification.
The text was updated successfully, but these errors were encountered: