As a data engineer for an e-commerce website, given a CSV dataset containing raw transactional data. My task was to design and implement a robust data pipeline that efficiently processes, stores, and exposes insights from this data. To achieve this, I first modelled the data into four key tables: Orders, Customers, Products, and Order Items. The Customers table holds user details, while the Products table contains product information. The Orders table tracks each transaction, linking customers to their purchases, and the Order Items table breaks down each order into individual product purchases, capturing quantities and prices. Once the schema is defined, I store the structured data in a PostgreSQL database, optimizing it with indexing and foreign key relationships for efficient queries. Next, I performed various computations, such as calculating total sales, customer lifetime value, and product performance metrics, directly in the database. To expose this data, I built a FastAPI endpoint that enables users to query and visualize key analytics, including top-selling products, customer purchase behavior, and revenue trends.
Before starting this project, you should:
- Have basic knowledge of SQL (SELECT, INSERT, JOIN, INDEXING).
- Understand relational databases and how tables relate via foreign keys.
- Be familiar with Python, particularly working with Pandas, FASTAPI and SQLAlchemy.
- Have PostgreSQL installed and running on your system.
Goal:
- Design a relational database based on the dataset.
- Load the provided CSV data into PostgreSQL.
- Write SQL queries for business insights.
- Expose API endpoints with FastAPI.
- Implement indexing and optimization.
- Use SQLAlchemy ORM for queries and operations.
- Document the API with Swagger UI (/docs)
- Order details (Order_Date, Order_Priority, Payment_method)
- Customer information (Customer_id, Gender, Device_Type, Customer_Login_Type)
- Product information (Product_Category, Product)
- Financial information (Quantity, Discount, Sales, Profit, Shipping_cost)
ecommerce_api/
├── .gitignore
├── README.md
├── main.py
├── pyproject.toml
├── poetry.lock
├── database/
│ ├── __init__.py
│ ├── models.py (*Models[database tables], relationships and indexes creation.)
│ ├── crud.py (*Database queries for insights.*)
│ ├── db_setup.py (*Database setup and connection.*)
├── data/
│ ├── ecommerce_dataset.csv (*Provided Dataset*)
├── scripts/
│ ├── __init__.py
│ ├── load_data.py (*Script to load CSV into PostgreSQL*)
├── tests/
│ ├── __init__.py
│ ├── test_load_data.py (*Testing the load_data script.*)
├── images/
│ ├── Ecommerce Database Project Process Map.jpg
│ ├── ecommerce database schema [MConverter.eu].png
│ ├── FastAPI structure.png
- Q1) Total number of unique customers

- Q2) Number of customers per device type (Web/Mobile)

- Q3) Percentage of members vs. guest users.

- Q4) Top 5 best-selling products based on sales.

- Q5) Top 3 product categories with the highest revenue.

- Q6) Products with the highest profit margins.

- Q7) Average order size (quantity per order).

- Q8) Total revenue and profit.

- Q9) Percentage of orders placed with High or Critical priority.

The optimum discount rate that yielded most sales is 30%.
Higher than does not translate to more sales.
50% and 10% discounts yielded the least two sales in ranking.
Endpoint Method Description
*/customers/* GET Get all customers
*/customers/{customer_id}* GET Get details of a specific customer
*/orders/* GET Get all orders
*/orders/{order_id}* GET Get details of a specific order
*/products/* GET Get all products
*/analytics/top-products/* GET Get top-selling products
*/analytics/revenue/* GET Get total revenue & profit
CREATE INDEX idx_order_customer ON Orders(customer_id);
CREATE INDEX idx_product_sales ON OrderItems(product_id);
git clone https://github.com/Data-Epic/database-fundamentals.gitpipx install poetrypoetry installpoetry updateIn env file: (I used postgres)
DB_URL ='postgresql://user:password@host:port/db_name'
Check the link below for help.
[text](https://www.strongdm.com/blog/postgres-create-database#:~:text=Open%20a%20terminal%20on%20Unix,%E2%80%9Cpostgres%E2%80%9D%20with%20that%20name.)
### **5. Run the API**
```cmd
poetry run fastapi dev main.py
Once the API is running, visit:
http://127.0.0.1:8000/docs
This provides an interactive interface to test the API endpoints.
Also you can visit the link below for extensive info on the API
http://127.0.0.1:8000/redoc
pandas (>=2.2.3,<3.0.0)
sqlalchemy (>=2.0.38,<3.0.0)
psycopg2 (>=2.9.10,<3.0.0)
fastapi (>=0.115.11,<0.116.0)
dotenv (>=0.9.9,<0.10.0)
[tool.poetry.group.dev.dependencies]
black = "^25.1.0" - for code formatting



