Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDC tool for databases #20179

Open
AbstractiveNord opened this issue Mar 26, 2024 · 5 comments
Open

CDC tool for databases #20179

AbstractiveNord opened this issue Mar 26, 2024 · 5 comments
Labels
source: new A request for a new source type: feature A value-adding code addition that introduce new functionality.

Comments

@AbstractiveNord
Copy link

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Could be Vector augment or replace tools like Debezium? A lot of sources, transforms, sinks already exists, and would be nice to have a ready for use DB CDC source in Vector. A PostgreSQL CDC source, for example, because PostgreSQL is highly popular.

Use Cases

CDC is widely used for data processing, especially in microservice architecture. For example, data from PostgreSQL may be needed to full-text indexed with ElasticSearch/OpenSearch/ManticoreSearch/etc, passed into Kafka pipelines or delivered to DWH, etc. Vector already support search engines, MQs as sinks, so Vector-based CDC looks as good idea.

Attempted Solutions

Debezium

Proposal

No response

References

Version

0.36.1

@AbstractiveNord AbstractiveNord added the type: feature A value-adding code addition that introduce new functionality. label Mar 26, 2024
@jszwedko
Copy link
Member

Thanks @AbstractiveNord, that is interesting. Vector is fundamentally a tool for processing observability data so I'm not sure satisfying the use-cases that Debezium seems to be targeting would be in scope though. It seems like it is meant for general event processing?

@jszwedko jszwedko added the source: new A request for a new source label Mar 26, 2024
@AbstractiveNord
Copy link
Author

AbstractiveNord commented Mar 26, 2024

Thanks @AbstractiveNord, that is interesting. Vector is fundamentally a tool for processing observability data so I'm not sure satisfying the use-cases that Debezium seems to be targeting would be in scope though. It seems like it is meant for general event processing?

On the one hand, yes, it's a out of scope a little. On the other hand, Vector implemented a lot of required stuff, like sinks, temp buffers, etc, it's battle tested and highly popular tool, so forking Vector project seems useless. Also, if Vector will support CDC, then logs probably can be enriched even with data directly from business events, not just logs.

Let's say we have a typical microservice architecture with PostgreSQL, Kafka, and some micro's. Input event pushes to PostgreSQL table and than goes to Kafka queue. In case of Vector, CDC support may allow to generate a log record directly by Vector, based on fact that's event successfully moved from PG to Kafka. Even with that example, CDC support for Vector can be useful for observability too.

@AbstractiveNord
Copy link
Author

I see that source as highly similar to file based source, just adopted to WAL segments. Feel free to correct me, I may be wrong at it.

@jszwedko
Copy link
Member

Thanks for the additional thoughts! The file source primarily exists to read logs written by applications from files rather than reading business events for processing, but I can see what you are saying about Vector being mostly fit for this use-case with minor improvements. I'm just wary of Vector's use-cases becoming too broad and its core functionality suffering for it. If we had the ability to have source plugins I think this would be a good candidate for that 🙂

For other readers, since I was unfamiliar, CDC is "change data capture": capturing record level changes in a database.

@AbstractiveNord
Copy link
Author

Thanks for the additional thoughts! The file source primarily exists to read logs written by applications from files rather than reading business events for processing, but I can see what you are saying about Vector being mostly fit for this use-case with minor improvements. I'm just wary of Vector's use-cases becoming too broad and its core functionality suffering for it. If we had the ability to have source plugins I think this would be a good candidate for that 🙂

For other readers, since I was unfamiliar, CDC is "change data capture": capturing record level changes in a database.

Yes in general, just I am not sure that's CDC can cause Vector to become too broad. In fact, very useful, mostly fit for observability usecases as pointed, additional popularity as Debezium alternative candidate, written in Rust, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
source: new A request for a new source type: feature A value-adding code addition that introduce new functionality.
Projects
None yet
Development

No branches or pull requests

2 participants