Linear separability diagnostics? #9

johnmyleswhite · 2013-01-11T23:55:32Z

One thing I'd really like is for Julia to tell the user when the data is linearly separable under a logistic model. This could be done by making a call to glm for logistic models terminate with a call to predict to see if there are no mispredicted responses. In that case, it would be nice to output a message noting this.

The text was updated successfully, but these errors were encountered:

Nosferican · 2018-04-06T03:41:13Z

Konis, Kjell. 2007 thesis has a survey of various practical methods and explains the approach taken in R's safeBinaryRegression.

andreasnoack · 2018-04-06T07:27:30Z

Thanks for the reference. I think that, ideally, the check could a post-processing function. Potentially run as part of the coeftable function.

Nosferican · 2018-04-06T07:31:42Z

I thought the main consideration would be to have the detection work during the fitting process and deal with it (e.g. drop covariates, drop observations, issue warning, early stop iteration, etc.) This approach is the one Stata uses which sequentially drops covariates / observations until the separability disappears. If it isn't possible it issues an error of no valid observations.

andreasnoack · 2018-04-06T07:40:01Z

I wouldn't be in favor of too much magic happening automatically. I'd rather provide the tools to diagnose this and let the user adjust the model. I also wouldn't be in favor of slowing down the fitting procedure. You might only be interested in prediction or parameters not affected by the separation.

Nosferican · 2018-04-06T08:01:11Z

The methods outlined take into consideration the additional computational expense incurred. I recently implemented O’Leary (1990) IRLS QR Newton (which might be one the DenseQR methods here?) for developing a few routines missing in GLM which I could use to verify the computational cost of adding those. It would not apply to all models, but those that are "unsafe", but I agree that warnings in this case might be preferred to a non-specified handling method. Linear separability seems trickier than just a non-full rank matrix which I am totally fine with automatically making it full rank and letting the user know. As for development, I think the safe-binary algorithms could be developed in a separate package and used in GLM. It might be nice to have the IRLS methods moved to a solver package too and called from GLM. Those can be optimized for Dense, Sparse, Mixed, and Distributed cases (see Kane and Lewis working notes). I mentioned this since StatsModels moves to allow other tabular data packages with different capabilities from DataFrames (Slack#Data). If this is something to consider I can move that discussion to a different to limit this one to the linear separability.

sergiocorreia mentioned this issue Jul 13, 2019

Iterative rectifier (IR) Nosferican/Econometrics.jl#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linear separability diagnostics? #9

Linear separability diagnostics? #9

johnmyleswhite commented Jan 11, 2013

Nosferican commented Apr 6, 2018

andreasnoack commented Apr 6, 2018

Nosferican commented Apr 6, 2018

andreasnoack commented Apr 6, 2018

Nosferican commented Apr 6, 2018

Linear separability diagnostics? #9

Linear separability diagnostics? #9

Comments

johnmyleswhite commented Jan 11, 2013

Nosferican commented Apr 6, 2018

andreasnoack commented Apr 6, 2018

Nosferican commented Apr 6, 2018

andreasnoack commented Apr 6, 2018

Nosferican commented Apr 6, 2018