Other languages: Spanish
OList is a complete ecosystem of digital services for online sales. It's mission is to promote digital retail commerce and provide real growth opportunities to all types of companies, eliminating obstacles and helping entrepreneurs for their space on the internet.
In this context, we are entrusted with consulting and product development work based on a set of data provided by Olist. These correspond to the period of time between 2016 and 2018 and contain information related to the Sales, Logistics and Marketing Departments.
We assume the task of converting the raw company data into valuable information for improving decision-making process, increasing the profits and the quality of the company's service.
We have implemented an automated pipeline for the ETL process. In addition, each component of the project runs as a Docker container, contributing to the scalability and portability of the system.
By using the Streamlit framework in Python, we have built a dashboard to monitor KPIs and critical aspects of the business model, helping with the decision-making process.
Additionally, to put machine learning models into operation, we have designed a web app to meet the needs in the areas of sales, marketing, and logistics. Specifically, we have implemented time series models for sales predictions, recommendation models based on content and collaborative filtering, and a regression model to estimate product delivery time.
This repository has an organized structure to facilitate project management. The main folders and their content are described below:
.
└── etapas_del_proyecto
└── etapa_1
└── etapa_2
└── etapa_3
This folder contains reports on the progress of the project in its different stages. Each stage is contained in a folder for easy management.
.
└── data_warehouse
└── apache_airflow
└── datasets
└── datasets_incremental
└── etl_module
This folder contains everything related to the Apache Airflow implementation, the original project datasets, the datasets generated for the incremental load tests, and the ETL module developed in Python.
.
└── dashboard
└── app
This folder contains the application developed in Streamlit that allows the visualization of the data.
.
└── machine_learning
└── app
└── models
└── notebooks
This folder contains the Python notebooks used for testing and building Machine Learning (ML) models, the generated models, and the application developed in Streamlit to test the ML models.
The project has been developed using Docker, so the installation is simple. Follow the steps below to run the project:
-
Clone the repository:
git clone https://github.com/agusdm97/PF-DTS05-E-COMMERCE-OLIST.git
-
Navigate to the root folder of the project:
cd PF-DTS05-E-COMMERCE-OLIST
-
Run the following command to run the docker containers:
⚠️ WARNING: The next step can be stressful for some computers. Please be assured you have enough memory and processing resources before executing this command.docker-compose up -d
Once the containers are working correctly you must follow the following steps:
- Enter to MinIO and upload the datasets:
This can be achieved by entering the following link MinIO, you must enter with the user root
and the password password
. After logging in, a bucket called airflow
has to be created and the datasets have to be loaded.
- Enter to Airflow and activate the DAG:
This can be achieved by entering the following link Airflow, you must enter with the user root
and the password password
. After entering, you have to enter the DAG called initial_load
and activate it.
- Enter to the Dashboard and the Machine Learning app:
Once the previous step is finished, you can access the Dashboard to view the data and the App-ML to explore the ML models.