Successful implementation of Steaming Pipeline and Batch Pipeline on Google Cloud
Our customer (company) from the ecommerce space decided to move its data processing, storage and analytics workloads to the Google Cloud Platform as part of their goal to provide their customers (end user) a better experience.
I successfully engineered streaming & batch data processing pipelines on the Google Cloud Platform.
I created the data pipeline infrastructure on Google Cloud for analyzing customer purchasing behavior in real-time and perfomed the analysis.
I plan to write a blog post about how to deploy these 2 pipelines on Google Cloud soon. Stay tuned!
I chose the eCommerce behavior data from multi category store available on Kaggle to focus on successfully implementing streaming and batch pipelines.
I pre-process (transform) data but real business data requires significantly more pre-processing as it's quality may not be ideal for the business problem(s) at hand.
Data file contains customer behavior data on a large multi-category online store's website for 1 month (November 2019).
Each row in the file represents an event.
-
All events are related to products and users
-
There are 3 different types of events → view, cart and purchase
The 2 purchase funnels are
- view → cart → purchase
- view → purchase
BigQuery (Storing streaming data)
Cloud Spanner (Storing data in batches)
- Daily event count
- Most visited sub-categories
- Hour vs Event Type vs Price
- Purchase conversion volume
- Purchase conversion rate