This project applies supervised machine learning techniques to predict the survival of Titanic passengers based on various features such as age, sex, passenger class, number of family members on board, fare paid, and embarkation port. The task is performed using structured Excel datasets for training and testing.
- Clean and preprocess the data (handle missing values, encode categorical features, scale numerical features).
- Explore the dataset using visualizations to identify key patterns and feature distributions.
- Train and evaluate several classification models (Logistic Regression, Random Forest, XGBoost, etc.).
- Perform feature selection and hyperparameter tuning using GridSearchCV with stratified 5-fold cross-validation.
- Choose the best-performing model based on macro F1-score and apply it to the test set.
Assignment2_supervised_learning_flow.ipynb– Main notebook containing code, experiments, and results.train.xlsx– Labeled training dataset.test.xlsx– Unlabeled test dataset.
- Python (pandas, numpy, matplotlib, seaborn, scikit-learn, xgboost)
- Jupyter Notebook
- Open the notebook
Assignment2_supervised_learning_flow.ipynb. - Run the cells in order to load the data, preprocess it, train models, and generate predictions.
- Modify parameters or models as needed to test different approaches.
- Orel Cohen
- Oshri Halevi