A machine learning project that predicts passenger survival on the Titanic using various classification algorithms with hyperparameter optimization.
This repository contains a comprehensive machine learning solution for the famous Kaggle Titanic survival prediction competition. The project implements multiple classification algorithms with hyperparameter optimization to predict whether a passenger survived the Titanic disaster based on features like age, gender, ticket class, fare, cabin, and more.
The solution employs a systematic approach:
- Data exploration and visualization
- Feature engineering and preprocessing
- Model training with hyperparameter optimization
- Model evaluation and selection
- Prediction generation for test data
- Exploratory Data Analysis: Comprehensive analysis of the Titanic dataset with visualizations to understand feature relationships and survival patterns
- Feature Engineering: Creation of new features like family size, title extraction from names, and family survival correlation
- Hyperparameter Optimization: Uses Hyperopt library to find optimal parameters for each model
- Multiple Classification Algorithms:
- Decision Tree Classifier
- Random Forest Classifier
- Gradient Boosting Classifier
- XGBoost Classifier
- K-Nearest Neighbors
- Support Vector Machine (implementation available but not used in main script)
- Neural Network with Keras (implementation available but not used in main script)
- Model Comparison: Automatic selection of the best performing model for final predictions
- Python 3.x
- Required libraries:
- pandas
- numpy
- scikit-learn
- matplotlib
- seaborn
- hyperopt
- xgboost
- keras (optional, for neural network implementation)
- Clone this repository:
git clone https://github.com/yourusername/KaggleTitanticSurvivalClassify.git
cd KaggleTitanticSurvivalClassify- Install required packages:
pip install pandas numpy scikit-learn matplotlib seaborn hyperopt xgboost keras- Run the main script to train models and generate predictions:
python titanicPredictSurvival.py- The script will:
- Load and preprocess the training and test data
- Perform feature engineering
- Train multiple models with hyperparameter optimization
- Select the best performing model
- Generate predictions for the test set
- Save predictions to
test_set_prediction.csv
This project is licensed under the MIT License - see the LICENSE file for details.