Make sigmoid layer for binary classifiers #36

dscolby · 2024-03-25T19:38:40Z

Currently, classification is done by just using the sigmoid activation function, which is basically just regression. This could potentially lead to predicted probabilities being outside of [0, 1]. Instead, for classification we should use a normal ELM with ReLU or another activation to get raw predictions and apply the sigmoid to those outputs similar to the way we use a softmax layer for multiclass classification.

dscolby · 2024-03-28T02:51:39Z

Possible options for getting predicted "probabilities" are:

1. Do nothing
        Since the ELM is minimizing the MSE, most of the time this would keep the predictions in [0, 1] but there could be times 
        when the predictions fall outside this range, as in a linear probability model.

2. Apply the sigmoid function to the predictions
        This would always constrain the outputs to [0, 1]. However, this would often be problematic because the output from the 
        ELM is already in one class or another and applying the sigmoid function would change the predicted class. For example, if 
        the threshold is the usual 0.5 and the ELM output a prediction of 0.4, which minimizes the MSE, then applying the sigmoid 
        function would give an output of 0.598687660112452, which would be predicting a different class. 

3. Use a clipping function.
        This would ensure that all the predictions would be in [0, 1] and it would also not change predicted classes. However, it 
        would not technically output a probability, like the sigmoid function would. Also, we would have to choose some range like 
        [1e-5, 1-1e-5] since no observation will have a probability of exactly 0 or 1.

Overall, using a clipping function is probably the best option because it does not change the optimization problem that the ELM is solving or change the predicted classes, keeps the prediction in [0, 1], and is probably doesn't make much of a difference for predictions outside [0, 1]

dscolby · 2024-04-28T14:55:52Z

It actually doesn't make sense to have categorical treatments or outcomes, so we can get rid of them.

dscolby self-assigned this Mar 28, 2024

dscolby added the reference So we don't try to address the same problem again. label Mar 28, 2024

dscolby mentioned this issue Mar 28, 2024

Make probability predictions for multiclass classification #38

Closed

dscolby closed this as completed Apr 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sigmoid layer for binary classifiers #36

Make sigmoid layer for binary classifiers #36

dscolby commented Mar 25, 2024

dscolby commented Mar 28, 2024 •

edited

Loading

dscolby commented Apr 28, 2024

Make sigmoid layer for binary classifiers #36

Make sigmoid layer for binary classifiers #36

Comments

dscolby commented Mar 25, 2024

dscolby commented Mar 28, 2024 • edited Loading

dscolby commented Apr 28, 2024

dscolby commented Mar 28, 2024 •

edited

Loading