-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linear separability diagnostics? #9
Comments
Konis, Kjell. 2007 thesis has a survey of various practical methods and explains the approach taken in R's safeBinaryRegression. |
Thanks for the reference. I think that, ideally, the check could a post-processing function. Potentially run as part of the |
I thought the main consideration would be to have the detection work during the fitting process and deal with it (e.g. drop covariates, drop observations, issue warning, early stop iteration, etc.) This approach is the one Stata uses which sequentially drops covariates / observations until the separability disappears. If it isn't possible it issues an error of no valid observations. |
I wouldn't be in favor of too much magic happening automatically. I'd rather provide the tools to diagnose this and let the user adjust the model. I also wouldn't be in favor of slowing down the fitting procedure. You might only be interested in prediction or parameters not affected by the separation. |
The methods outlined take into consideration the additional computational expense incurred. I recently implemented O’Leary (1990) IRLS QR Newton (which might be one the DenseQR methods here?) for developing a few routines missing in GLM which I could use to verify the computational cost of adding those. It would not apply to all models, but those that are "unsafe", but I agree that warnings in this case might be preferred to a non-specified handling method. Linear separability seems trickier than just a non-full rank matrix which I am totally fine with automatically making it full rank and letting the user know. As for development, I think the safe-binary algorithms could be developed in a separate package and used in GLM. It might be nice to have the IRLS methods moved to a solver package too and called from GLM. Those can be optimized for Dense, Sparse, Mixed, and Distributed cases (see Kane and Lewis working notes). I mentioned this since StatsModels moves to allow other tabular data packages with different capabilities from DataFrames (Slack#Data). If this is something to consider I can move that discussion to a different to limit this one to the linear separability. |
One thing I'd really like is for Julia to tell the user when the data is linearly separable under a logistic model. This could be done by making a call to
glm
for logistic models terminate with a call topredict
to see if there are no mispredicted responses. In that case, it would be nice to output a message noting this.The text was updated successfully, but these errors were encountered: