This project demonstrates a complete data pipeline and analysis workflow, starting from raw JSON data to generating interactive dashboards. The pipeline showcases the integration of modern cloud-based data storage, processing, and visualization tools.
It reflects real-world scenarios of handling unstructured data, transforming it, storing it securely in the cloud, and deriving actionable insights through business intelligence tools.
| Tool | Purpose |
|---|---|
| JSON | Raw data source format |
| Python | Data preprocessing, cleaning & transformation |
| Amazon S3 | Cloud storage for structured & raw data |
| Snowflake | Cloud data warehousing & querying |
| PowerBI | Data visualization & dashboarding |
- Extract JSON Data (Simulated Yelp Dataset)
- Clean & Transform Data using Python
- Upload Clean Data to Amazon S3
- Load Data from S3 into Snowflake
- Query & Analyze Data within Snowflake
- Visualize Insights using PowerBI Dashboards
βββ data/
β βββ raw/ # Original JSON data
β βββ processed/ # Cleaned and transformed data
β
βββ scripts/
β βββ extract_data.py # Load & Explore JSON data
β βββ transform_data.py # Clean & structure data
β βββ upload_s3.py # Upload to Amazon S3
β βββ snowflake_loader.py # Load into Snowflake
β
βββ powerbi/ # PowerBI dashboard files (.pbix)
β
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation
βββ .gitignore
The PowerBI dashboard showcases:
- Rating distribution
- Customer sentiment trends
- Top categories by review count
- Geographic distribution of reviews
- Time-series trends analysis
Name: Kindoli Edward
Role: Data Analyst | Data Engineer | BI Developer
GitHub: https://github.com/Kindoli
LinkedIn: https://www.linkedin.com/in/kindoli-edward-5058544a/
