InsureSight AI is an ML powered app for predicting health insurance premiums based on personal & medical profiles.
Frontend: Streamlit | ML Framework: XGBoost & scikit-learn | Language: Python
This project uses XGBoost regression models to predict health insurance premiums based on age, medical history, lifestyle factors, & demographics.
A machine learning web application that predicts health insurance premium costs based on user-provided personal and medical information. The application features an intuitive Streamlit interface, age-specific model selection, risk score normalization, and real-time predictions.
- Age-Based Model Selection: Uses separate trained models for users aged ≤25 and >25 for improved accuracy.
- Comprehensive Risk Assessment: Calculates normalized risk scores based on medical history conditions (Diabetes, Heart Disease, Blood Pressure, Thyroid).
- Feature Engineering: Automatically encodes categorical variables (Gender, Region, BMI Category, Smoking Status, etc.).
- Real-Time Predictions: Instant premium estimates displayed in INR format.
- Clean UI: Organized 4-row grid layout with intuitive icons for all input fields.
- Input Validation: Enforces valid ranges for all numeric inputs (Age, Dependants, Income, Genetic Risk).
- Pre-Trained Models: Pre-trained XGBoost models and scalers loaded via joblib for fast inference.
insurance_premium_prediction/
│
├── artifacts/
│ ├── model_u25.joblib # XGBoost model for age ≤ 25
│ ├── model_others.joblib # XGBoost model for age > 25
│ ├── scaler_u25.joblib # Feature scaler for age ≤ 25
│ └── scaler_others.joblib # Feature scaler for age > 25
│
├── main.py # Streamlit frontend application
├── prediction_helper.py # Prediction logic & preprocessing
├── requirements.txt # Project dependencies
├── .gitignore # Git ignore file
├── LICENSE # License file
└── README.md # Project documentation
- Python 3.10 or higher
- Git
git clone https://github.com/inv-fourier-transform/insurance-premium-prediction.git
cd insurance-premium-predictionpython -m venv venv
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activatepip install -r requirements.txtstreamlit run main.pyThe application collects the following information to generate premium predictions:
- Age (18–100): Primary insured's age
- Gender: Male or Female
- Marital Status: Married or Unmarried
- Number of Dependants (0–10): Family members covered
- Income in Lakhs (0–200): Annual income in Indian Rupees (Lakhs)
- Employment Status: Salaried, Self-Employed, or Freelancer
- BMI Category: Normal, Overweight, Obesity, or Underweight
- Smoking Status: No Smoking, Occasional, or Regular
- Genetical Risk (0–5): Genetic predisposition risk score
- Medical History:
- No Disease
- Diabetes
- High Blood Pressure
- Heart Disease
- Thyroid
- Or combinations of the above
- Insurance Plan: Bronze, Silver, or Gold coverage tier
- Region: Northwest, Southeast, Northeast, or Southwest
The calculate_normalized_risk() function in prediction_helper.py assigns risk scores to medical conditions:
- Heart Disease: 8
- Diabetes / High Blood Pressure: 6
- Thyroid: 5
- No Disease: 0
Combined conditions (e.g., Diabetes & Heart Disease) are summed and normalized to a 0–1 scale.
-
Categorical Encoding:
One-hot encoding for Gender, Region, Marital Status, BMI Category, Smoking Status, and Employment Status -
Ordinal Encoding:
Insurance Plan → Bronze = 1, Silver = 2, Gold = 3 -
Income Level Mapping:
Income converted to categorical levels:
<10L,10L–25L,25L–40L,>40L
- Users aged ≤ 25 use
scaler_u25andmodel_u25 - Users aged > 25 use
scaler_othersandmodel_others
Features are scaled using the appropriate scaler before prediction.
The XGBoost regression model returns the premium estimate.
To retrain the models with your own data:
- Prepare your dataset with the features listed above and target variable
premium_amount - Split data by age (≤25 and >25)
- Train separate XGBoost regressors for each age group
- Save models and scalers to the
artifacts/directory usingjoblib.dump()
Contributions are welcome!
Please open an issue or submit a pull request for improvements, bug fixes, or feature enhancements.
This project is licensed under the MIT License.
🎯 Predict your health insurance premiums with confidence using machine learning!


