A Stability Index for statistical models. An iterative procedure of selection and validation through the Delta BIC.
[MS Thesis in Statistical Sciences for decision - Federico II University of Naples project files.]
This folder contains:
- Bike sharing dataset
- Markdown Project code 'Bestfit_bikesharing.rmd'
- Thesis Brochure 'MS_Thesis.pdf'
The aim of the thesis is to develop a generalized and automatic procedure of selection and validation of a statistical model using the BIC criterion to derive a Stabiliy Index.
The Stability Index proposed in the thesis is tested in the case of the Linear Regression. The UCI bike sharing dataset take into account the number of random accesses to the bike sharing service in Washington D.C. during the weekends in the years 2011 and 2012 and the varying weather conditions.
A best model is selected with the Bestsubset selection from a group of similar in best performing models choosen on the Delta BIC ranks.
The Stability Index is separately computed on different partitions of the original data, considering both anomalous wheather conditions days and usuals ones, as follow:
For these samples, the estimation, selection and validation steps of the best model are iterated B times, in turn resampling the B times with two different percentage, that is:
Finally, for the sake of clarity, a comparison between all the combinations shows the insights on the Stability Index.