Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sigmoid layer for binary classifiers #36

Closed
dscolby opened this issue Mar 25, 2024 · 2 comments
Closed

Make sigmoid layer for binary classifiers #36

dscolby opened this issue Mar 25, 2024 · 2 comments
Assignees
Labels
reference So we don't try to address the same problem again.

Comments

@dscolby
Copy link
Owner

dscolby commented Mar 25, 2024

Currently, classification is done by just using the sigmoid activation function, which is basically just regression. This could potentially lead to predicted probabilities being outside of [0, 1]. Instead, for classification we should use a normal ELM with ReLU or another activation to get raw predictions and apply the sigmoid to those outputs similar to the way we use a softmax layer for multiclass classification.

@dscolby dscolby self-assigned this Mar 28, 2024
@dscolby dscolby added the reference So we don't try to address the same problem again. label Mar 28, 2024
@dscolby
Copy link
Owner Author

dscolby commented Mar 28, 2024

Possible options for getting predicted "probabilities" are:

1. Do nothing
        Since the ELM is minimizing the MSE, most of the time this would keep the predictions in [0, 1] but there could be times 
        when the predictions fall outside this range, as in a linear probability model.

2. Apply the sigmoid function to the predictions
        This would always constrain the outputs to [0, 1]. However, this would often be problematic because the output from the 
        ELM is already in one class or another and applying the sigmoid function would change the predicted class. For example, if 
        the threshold is the usual 0.5 and the ELM output a prediction of 0.4, which minimizes the MSE, then applying the sigmoid 
        function would give an output of 0.598687660112452, which would be predicting a different class. 

3. Use a clipping function.
        This would ensure that all the predictions would be in [0, 1] and it would also not change predicted classes. However, it 
        would not technically output a probability, like the sigmoid function would. Also, we would have to choose some range like 
        [1e-5, 1-1e-5] since no observation will have a probability of exactly 0 or 1. 

Overall, using a clipping function is probably the best option because it does not change the optimization problem that the ELM is solving or change the predicted classes, keeps the prediction in [0, 1], and is probably doesn't make much of a difference for predictions outside [0, 1]

@dscolby
Copy link
Owner Author

dscolby commented Apr 28, 2024

It actually doesn't make sense to have categorical treatments or outcomes, so we can get rid of them.

@dscolby dscolby closed this as completed Apr 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reference So we don't try to address the same problem again.
Projects
None yet
Development

No branches or pull requests

1 participant