An artificial intelligence model to predict passenger survival on the Titanic disaster using machine learning.
This project implements a complete machine learning pipeline to analyze the Titanic passenger dataset and predict survival outcomes. The model achieves 84% accuracy using Random Forest classification with comprehensive data preprocessing and feature engineering.
The project uses the famous Titanic dataset containing:
- PassengerId: Unique identifier for each passenger
- Survived: Survival outcome (0 = No, 1 = Yes) - Target variable
- Pclass: Passenger class (1st, 2nd, 3rd)
- Sex: Gender of passenger
- Age: Age in years
- SibSp: Number of siblings/spouses aboard
- Parch: Number of parents/children aboard
- Fare: Passenger fare
- Embarked: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)
- Missing Value Imputation:
- Age: filled with median value (28.0 years)
- Embarked: filled with most frequent port (Southampton)
- Feature Removal:
- Cabin: dropped due to 77% missing values
- Name & Ticket: removed as non-predictive features
- Categorical Encoding:
- Sex: binary encoding (male=0, female=1)
- Embarked: one-hot encoding with drop_first=True
- Algorithm: Random Forest Classifier
- Parameters: 100 estimators, random_state=42
- Train/Test Split: 80/20 ratio
- Cross-validation: Built-in bootstrap sampling
- Accuracy: 84% on test set
- Robustness: Consistent performance across different runs
- Interpretability: Feature importance analysis available
pip install pandas scikit-learnDependencies:
- Python 3.13+
- pandas 2.3.0+
- scikit-learn 1.7.0+
# Clone the repository
git clone https://github.com/SpaceBuddy231/Titanic-AI.git
cd Titanic-AI
# Run the model
python main.pyTitanic-AI/
βββ main.py # Complete ML pipeline
βββ data/
β βββ train.csv # Training dataset (891 passengers)
β βββ test.csv # Test dataset (418 passengers)
β βββ gender_submission.csv # Sample submission format
βββ README.md # Project documentation
β
Complete Data Preprocessing Pipeline
β
Missing Value Handling
β
Feature Engineering & Encoding
β
Machine Learning Model Training
β
Model Evaluation & Metrics
β
Clean, Documented Code
The model successfully processes the raw Titanic dataset and produces:
- Clean numerical features suitable for ML algorithms
- Zero missing values in the final dataset
- 84% prediction accuracy on survival outcomes
- Robust performance with Random Forest classification
Potential improvements for higher accuracy:
- Advanced feature engineering (family size, titles from names)
- Hyperparameter tuning with GridSearch
- Ensemble methods (Voting, Stacking)
- Cross-validation optimization