Optimizing Bank Marketing Model through building an event streaming pipeline that communicates with a Machine learning model microservice to display the likelihood and status of Bank Customers in real time.
For this project I chose to go with a Microservice architecture instead of a Monolithic one since Microservices can communicate asynchronously and are independent of one of another, which makes them even more fault tolerant, fast and easily maintainable.
The project comprises of two separate microservices that communicates with each other other i.e The Data Polling Microservices wrapped around Kafka and then the Machine Learning Model Microservice deployed as RESTAPI via docker registry on Heroku. See this repository for model deployment reference. Unfortunately, this deployment is no longer valid as Heroku does no longer support free hosting services however, I had to utilize the local version of this API which still works nonetheless.
The data for this experiment is downloaded from Kaggle. The dataset is trained on about 45000 rows of data(downloaded from UCI repository) while the test data for this experiment is over 4000 rows that are streamed to and fro Kafka.
The kafka producer reads each record from the test data and sends it to a kafka topic
The kafka consumer on receiving this data via the kafka topic, processes it since the method support asynchronous process. It then outputs both the input data and predicted result. Prediction result contains the likelihood of a Bank customer opening a term deposit with the bank, which is a boolean and the probability for that prediction.
My major challenge was consuming and manipulating the data after it has been sent to Kafka. After observing the data was not coming in the right format it was sent. It was rather coming in as a tuple object instead of a dict. This made it very difficult to serialize for the model inference. I had to build my own custom json encoder before I could serialize it and parse it as json for model inferencing.
-
To start installation of services and set up, run the below command:
docker-compose up
Note: running the docker command automatically creates a topic which is a bash command which I have included in the docker config file. Incase you wish to Manually create your own topic without having them created automatically prior to running the compose command. After cloning this repo edit the docker file by removing the kafka-create topic service as shown below;
kafka-create-topics: image: confluentinc/cp-kafka:5.2.0 depends_on: - broker hostname: kafka-create-topics command: "bash -c 'echo Waiting for Kafka to be ready... && \ cub kafka-ready -b kafka:9092 1 20 && \ kafka-topics --create --topic customer_pred --if-not-exists --zookeeper zookeeper:2181 --partitions 2 --replication-factor 1 && \ sleep infinity'"
Once removed you can Manually create as many topic as you want from your terminal after the
docker compose command
is done installing and setting up as shown below:docker exec -it kafka kafka-topics.sh --bootstrap-server localhost:9092 --topic <your topic name> --create
-
Create a virtual env and run:
pip install -r requirements.txt
-
To start the producer run:
python3 simulation.py
Note: Before running the consumer to start consuming data sent to our destination kafka topic. Make sure the model micro service API is running in the background also since we are utilizing the local version of the API.
-
To consume data and run model inferencing on every data sent. Run the below command:
faust -A customer_prediction worker -l info
If we need more workers to load and process the data we can also start an additional worker.
We can also stream our data to focus on a particular business metric. Let's assume a situation whereby the bank wish to run this model real time prediction only on customers between a certain age limit. We can simply leverage this asynchronous process to capture the age field, parse in our condition and forward all incoming data to that event. In this case, a new Kafka topic to capture only this business metric.
Feel free to reach out incase anything here doesn't work as expected.