FeatHub - A stream-batch unified feature store for real-time machine learning
-
Updated
May 27, 2024 - Python
FeatHub - A stream-batch unified feature store for real-time machine learning
Simple stream processing pipeline
Adapter for dbt that executes dbt pipelines on Apache Flink
📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.
Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python
Jupyter Integration for Flink SQL via Ververica Platform
Prototype which extracts stateful dataflows by analysing Python code.
This repo demonstrates how to use AWS application auto-scaling to implement custom-scaling in your Kinesis Data Analytics for Apache Flink applications
A complete data engineering project demonstrating modern data stack practices with Apache Flink, Iceberg, Trino and Superset
Python Examples for running Apache Flink® Table API on Confluent Cloud
Apache Flink MCP Server is a Model Context Protocol (MCP) implementation that empowers AI assistants and large language models to interact directly with Apache Flink clusters through natural language. It enables intelligent monitoring, management, and analysis of real-time streaming applications—making stream processing more intuitive, accessible.
A Smart Traffic Management System for Ho Chi Minh City, Vietnam leveraging batch and real-time data processing, intuitive dashboards, and monitoring tools to optimize traffic flow, enhance safety, and support sustainable urban mobility through advanced analytics and user-friendly applications.
A flinksql-mlflow-pytorch implementation
AUTH: Analytics of Utility Things is a platform for ingesting, processing and extracting insights from next billion connected Internet of Things (IoT).
Helps explain how Flink handles late arriving data and the effects on message order
Declarative Apache Flink Statefun over FastAPI
Here i share my practices and solutions for the dataExpert BootCamp. You can follow it in the repo: https://github.com/DataExpert-io/data-engineer-handbook/tree/main/bootcamp/materials
Leveraged AWS cloud services to create an anomaly detection system which allows maintenance teams to be alerted in real-time when wind farm sensors detect abnormally high wind speeds.
This project focuses on building a real-time streaming pipeline using Apache Flink and Apache Kafka. The goal is to enrich checkout data with user information, identify the first click leading to a checkout, and log the attributed checkouts into a Postgres sink table. The project implements concepts like state management, time attributes, watermark
Add a description, image, and links to the apache-flink topic page so that developers can more easily learn about it.
To associate your repository with the apache-flink topic, visit your repo's landing page and select "manage topics."