Data source:
https://www.kaggle.com/jsphyg/weather-dataset-rattle-package
The dataset contains 10 years of daily weather observations from many locations across Australia.
This project was built with:
-
Data Understanding
- Descriptions of variables
-
Exploratory Data Analysis
- Exploring the categorical and numerial variables
- Feature engineering of Date variable
- Outlier detection using boxplots
- Checking the distribution of numerical variables using histograms
- Checking the distribution of target variable (class distribution)
- Correlation analysis
- Checking for duplicates
-
Data Pre-processing
- Handling missing values
- Removing outliers
- Categorical data encoding
- Feature scaling
- Feature selection
-
Training Baseline Models
- Created a function to evaluate performance of multiple models using multiple metrics through Cross Validation
-
Shortlisting the Best Models
- Selected the top 3 models
-
Hyperparameter Tuning
- Determined the best parameters of the models using Randomized Search Cross Validation
-
Building Ensemble Models
- Built 3 ensemble models using Stacking Classifier
-
Model Evaluation
- Evaluated the performance of the 3 inital shortlisted models and the 3 ensemble models
- Plotted learning curves to compare the performance of the models on training and testing data
- Determined the best model
- Andy Chow Sai Kit
- Wong Yew Lee
- Li Chen Zhen