Skip to content

abdullahalzubaer/Regression-and-Missing-Value-Imputation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Imputing missing values using KNN-imputation approach and then fitting the data using RandomForestRegressor

Real world is full of missing values ! Either we ignore the samples that has missing values or we impute them

Objective: To work with dataset that has missing values, impute them and apply classifications algorithm to compare performance with and without impuation.


Dataset:

The Public 2020 Stack Overflow Developer Survey Results

https://insights.stackoverflow.com/survey


TODO

  • Provide proper documentaiton
  • Dataset characteristics
  • Try different imputation methods (for example, mean, median
  • Try other Regression algorithms
  • Provide the results of using different impuation methods and no impuation (Comparison)
  • Select the features based on feature imporatnce (now it is selected intuitively)
  • Dive deep into if it makes sense to encode the values of some features

Reference:

https://www.youtube.com/watch?v=xl0N7tHiwlw

https://stackoverflow.com/questions/54444260/labelencoder-that-keeps-missing-values-as-nan (for encoding the non-numerical feature values while keeping the missing values as missing)