Spaceship Titanic Competition: Predictive ML Modeling of a binary classification problem

Project Overview

This project focuses on predicting the fate of passengers aboard the Spaceship Titanic, a fictional interstellar vessel that encountered a spacetime anomaly. The objective is to determine whether each passenger was transported to an alternate dimension (Transported: True) or remained in their original state (Transported: False). The project involves data preprocessing, feature engineering, exploratory data analysis (EDA), and the development of machine learning models to perform binary classification. The ultimate goal is to produce accurate predictions that can assist rescue crews in identifying and retrieving transported passengers. During the project automatized hyperparameter tuning and AutoML were practiced.

Objectives

The main objective is to build an accurate machine learning model that predicts whether a passenger was transported during the Spaceship Titanic incident. The evaluation metric is accuracy, and models must achieve a minimum accuracy score of 0.79 to be considered successful. Additional goals include:

Exploring and visualizing key features such as age, cabin, passenger groupings, spending behavior, and cryo-sleep status to uncover meaningful patterns.
Engineering new features and handling missing data effectively to improve model performance.
Comparing and evaluating various classification algorithms to identify the most effective approach.
Generating a submission file with the predictions.

Key Insights

CryoSleep is the strongest predictor of being transported. Passengers in CryoSleep are far more likely to be transported than those awake.
HomePlanet, especially Europa, shows important interactions with CryoSleep, increasing predictive power. Other influential features include spending on Spa, RoomService, and VRDeck.
High onboard spending on Spa, VRDeck, and RoomService is linked to a lower chance of transport, while spending in FoodCourt and ShoppingMall correlates with higher transport rates.
The best result (accuracy: 0.80640) was achieved using a non-tuned XGBoost model with transformed features. More complex models (tuned, ensembles, AutoML) did not improve performance.

Model Evaluation

The model results are evaluated on the submitted test prediction while participating in the Kaggle competition.

Installation

To set up this project locally:

Clone the repository:

git clone https://github.com/razzf/survival-prediction-machine-learning.git

Navigate to the project directory:
```
cd survival-prediction-machine-learning
```
Install required packages: Ensure Python is installed and use the following command:
```
pip install -r requirements.txt
```

Usage

Open the notebook in Jupyter or JupyterLab to explore the analysis. Execute the cells sequentially to understand the workflow, from data exploration to model building and evaluation. For an in-depth exploration, refer to the notebook overview below.

Data

The dataset is located in the /data directory. It is originally derived from Kaggle. The data set reflects the passenger list of a fictional Spaceship Titanic during an incident. It contains data of about 13.000 passengers for 12 features (e.g. age, name, HomePlanet, Destination, expenditures, etc.) and one variable containing information if the person was transported to an alternate dimension during the Spaceship Titanic's collision with the spacetime anomaly.

Directory Structure

project-root/
├── custom_modules/
│   ├── custom_transformers.py         # Module for custom pipeline transformers
│   ├── plotting.py                    # Module for plotting visualizations
│   └── stat_calculations.py           # Module for statistical calculations
├── data/
│   ├── test.csv                       # training dataset inkluding target
│   └── train.csv                      # test dataset
├── notebooks/
│   ├── AutoML_1/                      # Results from the AutoML process 1 
│   ├── AutoML_2/                      # Results from the AutoML process 2  
│   ├── data preparation, EDA, statistical inference.ipynb   # Jupyter notebook_1 for data prep, EDA, and statistical inference
│   ├── machine learning modeling.ipynb                      # Jupyter notebook_2 for machine learning modeling and evaluation
│   └── submission.csv                 # Latest submitted test prediction 
├── requirements.txt                   # Python dependencies
└── README.md                          # Project documentation

Requirements

The requirements.txt file lists all Python dependencies. Install them using the command provided above.

Notebook Overview

The notebooks include the following sections:

Notebook 1: Data Preparation, EDA, and Statistical Inference

Introduction
Problem Discovery
Data Acquisition
Exploratory Data Analysis
Statistical Inference and Evaluation

Notebook 2: Machine Learning Modeling

Introduction
Load data
Split train data
Feature Engineering
Model Training, Evaluation, and Tuning
AutoML
Submission
Suggestions for Improvement

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spaceship Titanic Competition: Predictive ML Modeling of a binary classification problem

Project Overview

Objectives

Key Insights

Model Evaluation

Table of Contents

Installation

Usage

Data

Directory Structure

Requirements

Notebook Overview

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
custom_modules		custom_modules
data		data
images		images
notebooks		notebooks
README.md		README.md
requirements.txt		requirements.txt

razzf/survival-prediction-machine-learning

Folders and files

Latest commit

History

Repository files navigation

Spaceship Titanic Competition: Predictive ML Modeling of a binary classification problem

Project Overview

Objectives

Key Insights

Model Evaluation

Table of Contents

Installation

Usage

Data

Directory Structure

Requirements

Notebook Overview

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages