This is a showcase project to present my abilities in the development and deployment of an end-to-end machine learning pipeline, with a focus on cybersecurity and malicious URL detection. The project demonstrates my skills across the ML lifecycle — from data ingestion to production-ready deployment.
-
Modular Pipeline using custom components for:
- ✅ Data ingestion from MongoDB, validation, and transformation
- ✅ Model training, tuning (via
GridSearchCV), and evaluation - ✅ Overfitting/underfitting checks and drift detection
- ✅ MLflow logging for experiment tracking
- ✅ Batch prediction support for incoming CSVs
- ✅ Streamlit app for interactive use
- ✅ Docker support
- ✅ CI/CD with GitHub Actions
-
Preprocessing
- Missing value imputation with
KNNImputer - Feature scaling & label normalization
- YAML schema-driven pipeline logic
- Missing value imputation with
-
Modeling
- Ensemble methods (
RandomForest,GradientBoosting,AdaBoost) andLogisticRegression - Custom evaluation metrics with
f1_score, precision, recall, accuracy
- Ensemble methods (
-
Monitoring
- Data drift detection using
Kolmogorov–Smirnovtest - Drift reports saved in timestamped YAML files
- Data drift detection using
-
Deployment
- ✅ Final model serialization (including preprocessor)
- ✅
batch_prediction.pyfor real-world inference - ✅ Streamlit app for CSV-based prediction and visualization
- Python 3.12
- Scikit-learn, Pandas, NumPy
- MLflow for experiment tracking
- Streamlit for UI
- Docker for containerization
- GitHub Actions for CI
- YAML-based configuration
.
├── src/
│ ├── components/ # Data & model pipeline steps
│ ├── utils/ # Utility functions
│ ├── entity/ # Config & artifact classes
│ ├── constants/ # Static paths and values
│ ├── pipeline/ # Training & prediction pipelines
│ └── monitoring/ # Drift checking logic
├── artifacts/ # Timestamped pipeline outputs
├── app.py # Streamlit app
├── batch_prediction.py # Inference logic
├── main.py # Training pipeline trigger
└── requirements.txt
Install dependencies:
pip install -r requirements.txtTrigger training pipeline:
python main.pyRun Streamlit app for prediction:
streamlit run app.pyTrack all training metrics, parameters, and models via MLflow by setting:
MLFLOW_TRACKING_URI=<your_tracking_uri>Build the Docker image:
docker build -t network_security_app .Run batch prediction using mounted volumes:
docker run -v /local/input:/data/in -v /local/output:/data/out network_security_appThis project uses GitHub Actions to:
- Install dependencies
- Run unit tests
- Ensure reproducibility of builds
See .github/workflows/main.yml.
Built to simulate a production-grade ML system in the security domain. This project reflects real-world challenges like data quality, model drift, and deployment readiness — all handled with modular, testable, and extensible code.
Feel free to explore, fork, or ask questions!
Author: Shahriyar A. | 2025