A lightweight, production-ready capture layer that converts RabbitMQ streams into durable, replayable datasets for lakehouse architectures.
RabbitMQ Capture bridges the gap between queue-based ingestion and file-based processing by continuously consuming messages, batching them, and persisting them to storage (Delta, Parquet, or JSON). This enables seamless integration with tools like Databricks Auto Loader and standardizes ingestion around a storage-first approach.
The project provides a clear set of configurable policies — including batching, acknowledgements, and idempotency — allowing teams to control reliability, performance, and consistency without reinventing ingestion logic.
- Reliable ingestion from RabbitMQ with safe acknowledgement handling
- Flexible batching strategies (time, size, count, hybrid)
- Built-in idempotency patterns for duplicate handling
- Storage-first design enabling replay and audit
- Optimized file layout for Auto Loader and downstream processing
- Policy-driven architecture for consistency and governance
RabbitMQ is a queue, not a log. Once messages are consumed, they are gone. This project introduces a capture layer that turns ephemeral streams into persistent data, making them compatible with modern data platforms.
All incoming data — regardless of source — becomes:
- durable
- replayable
- observable
ready for lakehouse processing
Continuous integration builds:
TBC
