An end-to-end data engineering and analysis project that builds an automated data pipeline from Google Sheets to Snowflake, orchestrated with Apache Airflow and visualized in Tableau. This project captures real-world personal beauty product purchases, transforming raw entries into structured insights.
This project analyzes real-world beauty product purchase data through a robust data pipeline and visualization framework. It was designed to:
- Track personal purchase behaviors and product preferences.
- Uncover trends in spending, product types, and brand loyalty.
- Serve as a template for others to understand purchase patterns using real-life data and modern analytics tools.
- Source: Google Sheets with manually logged purchase data.
- Tool: Python uses gspread and pandas to pull the data.
- Initial transformations: Removing duplicates, handling missing values, data type conversion, data validation, and standardization, joining, and creating new columns.
- Tool: Python pandas library.
- The transformed data is loaded into a raw schema/table in Snowflake
- Tool: Snowflake Python Connector.
- Data modeling: The fact and dimension tables are designed in Snowflake using the Snowflake schema.
- Records are inserted into the appropriate fact and dimension tables using a stored procedure.
- Tool: Snowflake SQL.
- Apache Airflow DAG runs the full pipeline weekly.
- Handles error logging and ensures consistent data refresh.
- Tableau is connected to Snowflake (initially live, now extracted).
- The final dashboard is built to explore trends, KPIs, and insights.
- Tool: Tableau Desktop for dashboard building and Tableau Public for publishing.
- Origin: Manually logged Google Sheet tracking beauty product purchases.
- Tracked Fields: Date_Bought, Product_Name (as written by the store), Short_Name (actual product name), Product_Category, Product_Type, Product_Purpose, Brand, Store, Quantity, Unit_Price, and Total_Price.
- Modified fields: Date_Bought, Product_Name (Brand + '-' + Short_Name), Product_Category, Product_Type, Product_Purpose, Brand, Store, Quantity, Unit_Price, Total_Price, and Price_Category (low, medium, and high - created from unit price).
- ETL Flow: Google Sheets β Python (extraction, clean/transform) β Snowflake (raw, fact/dim tables) β Tableau (dashboard).
- Current State: Due to platform limitations, Tableau Public is used for final publishing with an extracted version of the Snowflake data.
A Tableau interactive dashboard was created to explore and communicate the insights derived from the data. It includes:
- Overview Dashboard: Annual trends, spending summaries, top products, stores & brands, and top 10 recent purchases.
- Product Dashboard: Product analysis and detailed table of all purchased items with filters/sort.
- Brand Dashboard: Brand analysis and detailed table of all brands with filters/sort.
- Dashboard Documentation: Reference material on navigating and using the dashboard.
π View the Dashboard on Tableau Public
.
βββ airflow/
β βββ dags/
β β βββ beauty_pipeline.py # Airflow DAG definition
β β βββ etl/ # Python scripts for each ETL stage
β β βββ extract.py # Extracts data from Google Sheets
β β βββ transform.py # Cleans and prepares data
β β βββ load.py # Loads data into Snowflake raw table
β βββ data/
β β βββ purchase_data.csv # Backup of cleaned & transformed data
β βββ docker-compose.yaml # Docker setup for Airflow
β βββ Dockerfile # Dockerfile for Airflow environment
β βββ requirements.txt # Python dependencies
β
βββ resources/
β βββ dashboard-images/ # Images used for dashboard background styling
β βββ icons/ # Icons used in the dashboard
β βββ images/ # Snapshots used in README
β
βββ snowflake/
β βββ data_model.sql # SQL to create Snowflake schema, tables, roles, etc.
β βββ stored_proc.sql # SQL stored procedure for transforming & inserting data
β
βββ .gitignore
βββ README.md
To successfully recreate/reuse this project for your own data pipeline and dashboard, follow the steps below:
git clone https://github.com/gloryodeyemi/Beauty_Purchase_Analysis.git
cd beauty-purchase-pipeline
You'll need the following accounts:
- Snowflake (free trial)
- Tableau Desktop (Free trial or full version)
- Google Cloud Platform (GCP) for Google Sheets API access
- Log in to your Snowflake account.
- Execute the SQL scripts in the snowflake/ folder:
- data_model.sql: Sets up warehouse, roles, database, schemas, and tables.
- stored_proc.sql: Creates stored procedures for data transformation and insertion.
If youβre pulling data from Google Sheets:
- Go to Google Cloud Console.
- Create a new project or select an existing one.
- Enable the Google Sheets API and Google Drive API.
- Create a service account and download the JSON key.
- Share your Google Sheet with the service account email.
- Add the JSON key file path to your project environment variable.
- Navigate to the airflow/ directory.
- Start Airflow using Docker Compose:
docker compose up --build -d
- Access the Airflow web UI at http://localhost:8080.
- Navigate to Admin -> Connections and create a new Snowflake and Google connection using your account information.
- Trigger the beauty_purchase_pipeline DAG to extract, transform, and load the data into Snowflake.
- Use Snowflake's UI to query your fact and related dimension tables.
- Confirm that your data is correctly inserted and transformed.
- Sign in to Tableau Desktop.
- Connect Tableau to your Snowflake database.
- Use the Snowflake credentials and warehouse info configured in data_model.sql.
- Recreate or import the dashboard using images & icons in resources/.
- Deploy the Pipeline to the Cloud: Host the existing Airflow pipeline on a cloud platform (e.g., AWS EC2) and connect it to a managed Snowflake instance for seamless scheduling and scalability.
- Add Predictive Analytics: Train and integrate a simple regression or time-series model within Snowflake (using Snowpark) to forecast future product demand.
- Implement Notifications: Set up email notifications for pipeline failures and completions.
Hi there! I'm Glory Odeyemi, a Data Engineer & Analyst!
Let's stay in touch! Feel free to connect with me on the following platforms:



