A distributed implementation of "Nested Subtree Hash Kernels for Large-Scale Graph Classification Over Streams" (ICDM 2012).
-
Updated
Aug 14, 2022 - Python
A distributed implementation of "Nested Subtree Hash Kernels for Large-Scale Graph Classification Over Streams" (ICDM 2012).
The goal of this project is aimed at optimizing Bank Marketing Model through building an event streaming pipeline around Apache Kafka and its ecosystem that communicates with a Machine learning model microservice. Utilizing this to display the likelihood and status of Bank Customers in real time.
Streaming data analysis using AWS tools such as Cloud9 to generate events in the cloud, using boto3 to send records to Kinesis Data Firehose to connect to the S3 bucket destination, saving files in .parquet format. With the help of Glue, a data catalog will be created to enable real-time querying of all records with Athena.
Apache Spark With Scala - hands on with big data
Python code is shared that simulates random events in two scenarios: Technology E-commerce and Megastore in their mobile app. This is done to generate large-scale data that can be processed using Data Engineering tools.
Add a description, image, and links to the streaming-processing topic page so that developers can more easily learn about it.
To associate your repository with the streaming-processing topic, visit your repo's landing page and select "manage topics."