I used R to construct supervised machine learning models with a dataset of popular songs, to predict Spotify ratings based on auditory features of the songs.
- This is a Kaggle competition I participated in November 2022.
- It provides dataset that describes popular songs based on auditory features, such as loudness, tempo, performer and genre.
- The goal is to contruct a predictive model using analysisData.csv, and predict the songs' ratings based on the features in scoringData.csv.
- Model performance will be evaluated based on RMSE (root mean squared error).
Please view the detailed raw R codes and coding results for the entire data analysis process as an illustration.
- Data Exploration
- Data Tidying
- Encode Missing Data
- Data Parsing
- Data Transformation
- Feature Selection
- Variable Inter-set
- Remove Near Zero Variance
- Principal Components Analysis
- Split Data
- Data Analysis - Modeling
- Multiple Regression
- Regression Tree
- Random Forest
- Ranger
- XGBoost
- gbm
- Radial SVM
- Model Tuning
- Comparison
- Prediction