Trying to predict if the client of a bank subscribed a term deposit (binary: "yes","no")
The project was shared with me by Professor William Yu of UCLA Anderson school as a challenge data science project. The main issues with the project were collinearity between variables, missing values that prevented easy modeling. To deal with the issues, I used PCA and tried several imputation approaches. Missing value imputation was not greatly beneficial, so I ultimately decided to make imputation (kNN imputation) optional.
In the end, the project itself was most challenging because of its size, which really affected the run-time. I had to limit my selection of models and imputation techniques to make the project run-time bareable.
I want to learn more about making my code more efficient especially in regards to modeling and imputation techniques. Most datasets are large so I can see this being common problem.