This repository contains code and documentation for predicting disease outbreaks using machine learning techniques. By leveraging historical data, environmental factors, and socio-economic indicators, the project aims to develop predictive models to identify the likelihood and intensity of disease outbreaks in specific regions.
Watch the demo of the application in action:
- Data Preprocessing: Handle missing values, normalize data, and engineer features relevant to disease outbreaks.
- Exploratory Data Analysis (EDA): Visualize trends, correlations, and spatial distributions.
- Machine Learning Models: Implement various models including Random Forest, Gradient Boosting, Neural Networks, and more.
- Evaluation Metrics: Assess model performance using accuracy, precision, recall, F1-score, and AUC-ROC.
- Prediction Visualization: Display predictions on maps and charts for intuitive understanding.
- ✅ Multilingual Support: Switch between English and Tamil within the web app.
- 🧾 PDF Report Generation: Download medical prediction reports in PDF format.
- 🔤 Tamil Font Integration: Ensures proper rendering of Tamil characters in generated PDFs. 👉 Download Tamil Font - Latha.ttf
- Getting Started
- Prerequisites
- Installation
- Usage
- Dataset
- Models
- Results
- Contributing
- Contact Information
Follow the instructions below to set up the project and run the models on your system.
- Python 3.8+
- pip package manager
Clone the repository:
git clone https://github.com/Janviswa/Disease-outbreak-prediction-using-Machine-Learning.git
cd Disease-outbreak-prediction-using-Machine-LearningCreate a virtual environment:
python -m venv env
source env/bin/activate # On Windows: env\Scripts\activateInstall the required dependencies:
pip install -r requirements.txt-
Prepare your dataset by placing it in the
data/directory. Ensure it matches the expected format. -
Run the preprocessing script:
python preprocess.py
-
Train the machine learning models:
python train.py
-
Evaluate the models and visualize results:
python evaluate.py
-
Generate predictions for new data:
python predict.py --input new_data.csv
-
Run the Streamlit web application:
streamlit run app.py
Supported datasets:
- Heart Disease: Heart Disease Dataset on Kaggle
- Diabetes: Diabetes Dataset on Kaggle
- Parkinson's Disease: Parkinson's Dataset on Kaggle
This project supports various machine learning models, including but not limited to:
- Decision Trees
- Random Forest
- Gradient Boosting (e.g., XGBoost, LightGBM)
- Neural Networks
- Support Vector Machines (SVM)
Includes hyperparameter tuning and model optimization.
Evaluation metrics used to assess model performance:
- Accuracy
- Precision
- Recall
- F1-score
- AUC-ROC
Visualizations display predictions and insights in spatial and temporal formats.
| Feature | Language |
|---|---|
| Interface Texts | English, Tamil |
| PDF Reports | English, Tamil |
Tamil fonts are embedded into the PDF reports. If you’re facing any font rendering issues, download the Latha Tamil font here and install it locally.
Contributions are welcome! To contribute:
-
Fork the repository.
-
Create a new branch:
git checkout -b feature-name
-
Make your changes and commit:
git commit -m "Description of changes" -
Push to the branch:
git push origin feature-name
-
Create a pull request.
For questions, feedback, or collaborations, feel free to reach out:
📧 Email: jananiviswa05@gmail.com 🔗 LinkedIn: linkedin.com/in/janani-v