This project built a supervised extra tree model to shape the issue.
- Built a feature selection function based on the correlation coefficient matrix and data visualization, which effectively reduced 78% of the noise (28 features down to 6) and maintained the overall model performance.
- Improved recall while maintaining precision by applying a customized algorithm and precision and recall curve.
- Demonstrated dataset manipulation use case through Numpy and Pandas.
- Demonstrated data visualization use cases through Seaborn, matplotlib, Plotly.
- Demonstrated model evaluation metrics use cases through classification reports, confusion matrix, precision and recall curve.
- Demonstrated resampling methods use cases through SMOTE and Random Sampler.
- Demonstrated model building, evaluation, hyperparameter tuning and pipeline workflow use cases through Sklearn.
- Demonstrated dimension reduction use cases through Autoencoder and UMAP.
- Data process
- Numpy
- Pandas
- Data visualization
- Matplotlib
- Seaborn
- Plotly
- Sampling
- Pandas
- Random sampling
- Sklearn
- Train test split
- Imblearn
- SMOTE
- Random Sampler
- Pandas
- dimension reduction
- UMAP
- Autoencoder
- Model building & evaluation
- Sklearn
- Cross Validation
- Grid search
- Pipeline
- Extra tree model
- Classification report
- Confusion matrix
- Precision and recall curve
- Sklearn
- Model selection
- Pycaret