Data Pipeline and Machine Learning Model Deployment Using Kafka, CockroachDB, and Streamlit

This repository provides a complete data pipeline setup that ingests financial data from Yahoo Finance, streams it into Kafka, stores and processes it in CockroachDB, trains a machine learning model, and deploys the model using Streamlit. The entire setup is containerized using Docker, enabling easy deployment and scalability.

Overview

The goal of this project is to build a robust data pipeline that seamlessly integrates data ingestion, storage, processing, and machine learning model deployment. This pipeline can be adapted to various use cases requiring real-time data processing and predictive modeling.

Architecture

The architecture of this data pipeline consists of several key components:

Data Ingestion: Financial data is fetched from Yahoo Finance using a Python script (create.py) and is streamed into Kafka.
Kafka: A distributed streaming platform used for building real-time data pipelines and streaming applications.
CockroachDB: A distributed SQL database that stores and processes the streamed data for further use.
Machine Learning: The write_to_ml.py script trains a machine learning model based on the processed data stored in CockroachDB.
Model Deployment: The trained model is deployed using Streamlit, allowing for interactive data visualization and predictions.

Prerequisites

Before setting up the pipeline, ensure you have the following installed on your system:

Setup and Installation

Clone the repository:

git clone https://github.com/yourusername/kafka-cockroachdb-ml-pipeline.git
cd kafka-cockroachdb-ml-pipeline

Environment Variables: Set up necessary environment variables in your .env file:

COCKROACH1_DATA_DIR=/path/to/data1
COCKROACH2_DATA_DIR=/path/to/data2
COCKROACH3_DATA_DIR=/path/to/data3
COCKROACH1_CERTS_DIR=/path/to/certs1
COCKROACH2_CERTS_DIR=/path/to/certs2
COCKROACH3_CERTS_DIR=/path/to/certs3
COCKROACH1_HOST=cockroach1
COCKROACH2_HOST=cockroach2
COCKROACH3_HOST=cockroach3

Build and Run Containers:
```
docker-compose up -d
```
Run Python Scripts:
- Produce data to Kafka:
```
python create.py
```
- Connect to CockroachDB and write data:
```
python cockroach_connect.py
python db_writer.py
```
- Train the machine learning model and deploy it:
```
python write_to_ml.py
streamlit run streamlit.py
```

Docker Compose Services

CockroachDB: Three-node CockroachDB cluster (cockroach1, cockroach2, cockroach3).
Kafka: Kafka brokers (kafka-1, kafka-2, kafka-3), Schema Registry, and Kafka Connect.
ksqlDB: Stream processing with ksqlDB.
Streamlit: Deploys the trained machine learning model.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request for review.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
__pycache__		__pycache__
cockroachdb		cockroachdb
kafka		kafka
python		python
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
sensor_log.txt		sensor_log.txt
stock_price_pipeline.pkl		stock_price_pipeline.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Pipeline and Machine Learning Model Deployment Using Kafka, CockroachDB, and Streamlit

Table of Contents

Overview

Architecture

Prerequisites

Setup and Installation

Docker Compose Services

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Jeffawe/Kafka-Stock-Predictor

Folders and files

Latest commit

History

Repository files navigation

Data Pipeline and Machine Learning Model Deployment Using Kafka, CockroachDB, and Streamlit

Table of Contents

Overview

Architecture

Prerequisites

Setup and Installation

Docker Compose Services

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages