Skip to content

anuj-data-lab/IRONCLAD_ETL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🛡️ IRONCLAD_ETL: Secure Data Pipeline

Enterprise-Grade Extraction, Transformation, and Loading Engine

Raw web scraping is a liability if the data is corrupt. IRONCLAD_ETL is a Python-based pipeline designed to ingest messy, unstructured datasets, apply rigorous transformation rules (Regex cleaning, Pandas normalization), and validate data integrity before injecting it into a secure SQLite relational database.

Core Architecture

  • Extract: Ingests raw .csv payloads.
  • Transform: Applies .str.strip(), regex currency cleaning, and handles NULL injections.
  • Validate: A dedicated validator.py module blocks corrupted data (e.g., negative prices, missing strings) from database entry.
  • Load: Securely maps the Pandas DataFrame into an active SQLite enterprise_warehouse.db.

This system ensures downstream analytics never break due to dirty upstream inputs.

About

IRONCLAD_ETL: A robust Python-based data pipeline designed to extract messy web data, validate integrity with custom logic, and load structured payloads into secure SQL databases.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages