🚢 Spaceship Titanic - Kaggle Competition Solution

Welcome to my repository for the Spaceship Titanic Kaggle competition! This project provides a solution to the binary classification problem: predicting whether a passenger aboard the Spaceship Titanic was transported to another dimension (Transported = 1) or not (Transported = 0).

📋 Project Description

The goal is to build an accurate machine learning model to predict passenger transportation outcomes.

The model processes the provided data, handles missing values, encodes categorical features, and scales numerical features to achieve high performance metrics.

🛠️ Methodology

The preprocessing is handled by the preprocess_data():

Handling Missing Values: For numerical features, missing values are filled with the median. For categorical features, missing values are filled with the most frequent value.
Splitting the "Cabin" Feature: The Cabin feature is split into three new features: Deck, Cabin_num, and Side using .str.split().
Dropping Unnecessary Features: Features such as Name and Cabin are dropped.
Converting Data Types: The Age feature is converted to an integer type. The Cabin_num feature is also converted to an integer type.

🔋 Encoding and Scaling Features

The function order_and_scale_features() is used to process categorical and numerical features:

Numerical Features: Scaled using RobustScaler to reduce the effect of outliers.
Categorical Features: Encoded using OrdinalEncoder.

The function supports both training mode (is_train=True) and testing mode:

In training mode, the function encodes and scales the features while saving the fitted parameters.
In testing mode, it only transforms the data using the saved encoder and scaler.

🚴🏻‍♂️ Model Training

The project uses two main machine learning models:

RandomForestClassifier: A straightforward and interpretable ensemble method.

random_forest = RandomForestClassifier(
    n_estimators=200,
    max_features='sqrt',
    min_samples_split=4,
    min_samples_leaf=2,
    random_state=42)
random_forest.fit(X_train_final, y_train)

CatBoostClassifier: A gradient boosting algorithm that is particularly effective for categorical data.

cgbdt = CatBoostClassifier(
    iterations=200,
    learning_rate=0.2,
    random_seed=42,
    verbose=200,
    l2_leaf_reg=7
)
cgbdt.fit(X_train_final, y_train)

📈 Results

The best results were achieved using the CatBoostClassifier, with:

Local validation accuracy: ~82%
Kaggle public leaderboard score: ~0.79

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
README.md		README.md
Spaceship_Titanic.ipynb		Spaceship_Titanic.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚢 Spaceship Titanic - Kaggle Competition Solution

📋 Project Description

The goal is to build an accurate machine learning model to predict passenger transportation outcomes.

The model processes the provided data, handles missing values, encodes categorical features, and scales numerical features to achieve high performance metrics.

🛠️ Methodology

The preprocessing is handled by the preprocess_data():

🔋 Encoding and Scaling Features

The function order_and_scale_features() is used to process categorical and numerical features:

The function supports both training mode (is_train=True) and testing mode:

🚴🏻‍♂️ Model Training

The project uses two main machine learning models:

📈 Results

The best results were achieved using the CatBoostClassifier, with:

About

Packages

Languages

alel2003/Spaceship-Titanic

Folders and files

Latest commit

History

Repository files navigation

🚢 Spaceship Titanic - Kaggle Competition Solution

📋 Project Description

The goal is to build an accurate machine learning model to predict passenger transportation outcomes.

The model processes the provided data, handles missing values, encodes categorical features, and scales numerical features to achieve high performance metrics.

🛠️ Methodology

The preprocessing is handled by the preprocess_data():

🔋 Encoding and Scaling Features

The function order_and_scale_features() is used to process categorical and numerical features:

The function supports both training mode (is_train=True) and testing mode:

🚴🏻‍♂️ Model Training

The project uses two main machine learning models:

📈 Results

The best results were achieved using the CatBoostClassifier, with:

About

Topics

Resources

Stars

Watchers

Forks

Packages 0

Languages

Packages