Skip to content

jmsmistral/streaming-docker-poc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Streaming PoC with Kafka, Spark, and Zeppelin

Here you'll find a small proof-of-concept for how to:

  • Generate semi-realistic, synthetic data through a config-driven Python app
  • Send JSON messages through a Kafka message broker
  • Process streaming data with Spark on Zeppelin notebooks (including window funtions)

How to run

  • Dependencies: docker, docker-compose
  • Requirements: You might need to increase the memory assigned to Docker to ~8GB

Clone the repo, navigate to the directory, and run:

$ docker-compose up

This will bring up:

  • Python test data generation app (this will sleep for 20 seconds waiting for Kafka to be up before producing data)
  • Kafka (with Zookeeper as a separate service)
  • Zeppelin (already configured with everything needed to analyse Kafka streams with Spark)

Then navigate to: localhost:8080 to access Zeppelin

About

A docker-based data streaming PoC (Python test data generator, Kafka, Spark, Zeppelin)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published