-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDC tool for databases #20179
Comments
Thanks @AbstractiveNord, that is interesting. Vector is fundamentally a tool for processing observability data so I'm not sure satisfying the use-cases that Debezium seems to be targeting would be in scope though. It seems like it is meant for general event processing? |
On the one hand, yes, it's a out of scope a little. On the other hand, Vector implemented a lot of required stuff, like sinks, temp buffers, etc, it's battle tested and highly popular tool, so forking Vector project seems useless. Also, if Vector will support CDC, then logs probably can be enriched even with data directly from business events, not just logs. Let's say we have a typical microservice architecture with PostgreSQL, Kafka, and some micro's. Input event pushes to PostgreSQL table and than goes to Kafka queue. In case of Vector, CDC support may allow to generate a log record directly by Vector, based on fact that's event successfully moved from PG to Kafka. Even with that example, CDC support for Vector can be useful for observability too. |
I see that source as highly similar to file based source, just adopted to WAL segments. Feel free to correct me, I may be wrong at it. |
Thanks for the additional thoughts! The For other readers, since I was unfamiliar, CDC is "change data capture": capturing record level changes in a database. |
Yes in general, just I am not sure that's CDC can cause Vector to become too broad. In fact, very useful, mostly fit for observability usecases as pointed, additional popularity as Debezium alternative candidate, written in Rust, etc. |
A note for the community
Could be Vector augment or replace tools like Debezium? A lot of sources, transforms, sinks already exists, and would be nice to have a ready for use DB CDC source in Vector. A PostgreSQL CDC source, for example, because PostgreSQL is highly popular.
Use Cases
CDC is widely used for data processing, especially in microservice architecture. For example, data from PostgreSQL may be needed to full-text indexed with ElasticSearch/OpenSearch/ManticoreSearch/etc, passed into Kafka pipelines or delivered to DWH, etc. Vector already support search engines, MQs as sinks, so Vector-based CDC looks as good idea.
Attempted Solutions
Debezium
Proposal
No response
References
Version
0.36.1
The text was updated successfully, but these errors were encountered: