Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(new sink): add CnosDB sink #18156

Open
Subsegment opened this issue Aug 4, 2023 · 16 comments
Open

feat(new sink): add CnosDB sink #18156

Subsegment opened this issue Aug 4, 2023 · 16 comments
Labels
sink: new A request for a new sink type: feature A value-adding code addition that introduce new functionality.

Comments

@Subsegment
Copy link

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

We want to add a CnosDB sink to Vector so that:

  • Build Vector + CnosDB + Grafna process to provide more convenient observation and alarm service of logs and metrics.

  • CnosDB provides a variety of time series functions for better analysis of the data collected

Attempted Solutions

Try to get CnosDB to support sink as vector type: there is no graceful way to pass some parameters, only some parameters that are used only once are added to the data written, which causes some waste of resources and is not very safe

Proposal

I have submitted a pr about add cnosdb sink

References

#18147

Version

0.31.0

@Subsegment Subsegment added the type: feature A value-adding code addition that introduce new functionality. label Aug 4, 2023
@dsmith3197
Copy link
Contributor

Hi @Subsegment,

I noticed that CnosDB supports InfluxDB and Prometheus Remote Write integrations. It looks like Vector's Prometheus Remote Write sink will support your use case as is. Do you think that will fit your needs?

@Subsegment
Copy link
Author

Hi @dsmith3197
Thank you very much for your suggestion. In fact, using prom and using vector as a sink have the same problem. The health check cannot pass, the authority check cannot pass, and some private tokens are transmitted through data. We think this is not elegant and safe.

@Subsegment
Copy link
Author

Hi @dsmith3197 Thank you very much for your suggestion. In fact, using prom and using vector as a sink have the same problem. The health check cannot pass, the authority check cannot pass, and some private tokens are transmitted through data. We think this is not elegant and safe.

In addition, I observed the code of prom sink. It seems that prom only processes metric data at present, and there is no logic related to log event.

@dsmith3197
Copy link
Contributor

@Subsegment Thanks for looking into the Prometheus Remote Write sink. You are correct that it only supports metrics.

If you want to support metrics and logs, then the InfluxDB sink will be more suitable. From what I can tell, #18147 is heavily adopted from the InfluxDB sink. Rather creating a new sink, I think we can likely extend the InfluxDB sink or extract the shared logic and use the generic http sink.

Could you elaborate on the differences between the CnosDB protocol and the InfluxDB protocol?

@Subsegment
Copy link
Author

Subsegment commented Aug 23, 2023

Hi @dsmith3197

We support lineprotocol from influxdb, so we assemble lineprotocol when sink sends request, so we adopt some influxdb sink in cnosdb sink, However, we found that the influxdb log sink directly converts the data to strings when processing the map and array, which is a defect of the inflxudb protocol and prevents the observability of some data. We will support direct storage of map and array types in the future, so we will modify the influxdb protocol or use the new protocol directly. I hope this issue will not prevent us from merging the cnosdb sink. There are also some of the same problems, if the inflxudb sink is used as the cnosdb sink, some privacy data processing and parameter issues are difficult to be handled.Using an http sink means that you need to convert different data protocols before sending them, which may need to be handled with transform VRL. It is easier to convert directly in the sink than relying on vector-rich sources.

Thanks for your advice

@dsmith3197
Copy link
Contributor

Hi @Subsegment, thanks for the reply. I have a few questions I'd like to ask so that we can fully understand your needs and come up with the best solution for us all 🙂.

We support lineprotocol from influxdb, so we assemble lineprotocol when sink sends request, so we adopt some influxdb sink in cnosdb sink, However, we found that the influxdb log sink directly converts the data to strings when processing the map and array, which is a defect of the inflxudb protocol and prevents the observability of some data. We will support direct storage of map and array types in the future, so we will modify the influxdb protocol or use the new protocol directly.

I understand that the influxdb line protocol does not adequately support maps and arrays for you. With that being said,
the influxdb line protocol is designed primarily for metrics, which typically do not have nested maps or arrays, but not logs. Do you want to support sending both logs and metrics to cnosdb? If so, the influxdb line protocol or prometheus remote write protocol could be great choices for metrics, but won't be ideal for logs. Do you know what protocol you would use in the future?

There are also some of the same problems, if the inflxudb sink is used as the cnosdb sink, some privacy data processing and parameter issues are difficult to be handled.Using an http sink means that you need to convert different data protocols before sending them, which may need to be handled with transform VRL. It is easier to convert directly in the sink than relying on vector-rich sources.

Could you please give an example of the "privacy data processing and parameter" issues you mention above to help me better understand? Also, could you explain/give examples of how a CnosDB HTTP request differs from an InfluxDB HTTP request? For example, do they have different headers, authentication, schemes, etc? I would like to better understand the differences between the two to best advise on how to proceed.

@Subsegment
Copy link
Author

Subsegment commented Aug 25, 2023

Thank you for your reply @dsmith3197

  • As for the issue of protocols, since the current log events are not well supported by these protocols, we will develop our own client in the future, which will directly encode and assemble the log events and metric events and send them to cnosdb, and cnosdb will be responsible for handling the map and array of the log event.

  • The questions about privacy are the ones you mentioned, including some usernames and passwords and so on. Let me give you two examples

    CnosDB:headers : (Authorization:Basic cm9vdDo=), url: http://127.0.0.1:8902/api/v1/write?tenant=cnosdb&db=public&chunked=false

    InflxuDB: headers:(Authorization: Token xxxxxxxxxxx), url:http://127.0.0.1:8086/write?db=public&u=username&p=password_sensitive_string

    As you can see, influxdb and cnosdb are not very close, only the db parameter is similar, cnosdb supports tenant, which is for cloud-native multi-tenant scenarios, and inflxudb does not support specified tenant writing.In addition, the encoding of Authorization is also different, our basic comes from the user name and password, influxdb uses its own token.

Here are some answers to your questions. Also, if a lot of repetitive logic doesn't look elegant, I can modify the code to reuse some of the logic of the inflxudb sink.

@dsmith3197
Copy link
Contributor

Hi @Subsegment,

Thank you for the detailed explanation! Let me revisit the component qualification checklist and get back to you.

@Subsegment
Copy link
Author

Hi @Subsegment,

Thank you for the detailed explanation! Let me revisit the component qualification checklist and get back to you.

Ok, thanks!

@neuronull
Copy link
Contributor

👋 Hi @Subsegment ! Apologies for the delayed response here!

I am picking up where Doug left off with this new component qualification, and am in the process of getting up to speed on it.

Want to confirm since it's been a little while, are you still interested in pursuing this?

@Subsegment
Copy link
Author

Subsegment commented Oct 23, 2023

👋 Hi @Subsegment ! Apologies for the delayed response here!

I am picking up where Doug left off with this new component qualification, and am in the process of getting up to speed on it.

Want to confirm since it's been a little while, are you still interested in pursuing this?

I understand. Yes, we are still interested in pursuing this

@neuronull neuronull added the sink: new A request for a new sink label Oct 24, 2023
@neuronull
Copy link
Contributor

Thanks for your patience @Subsegment!

We discussed this and have a proposal for an alternative approach to the design taken in the existing PR:

  • In order to reduce logic duplication, and to make this change more re-usable for future components, we'd like to suggest extracting the influxdb line protocol logic that is used in your PR and in the influxdb sink, into a new influxdb codec.

  • With the new codec, the existing HTTP sink should be usable with proper configuration for auth and query parameters.

What do you think?

Thanks~

@Subsegment
Copy link
Author

Thank you for your advice. @neuronull

This is fine for the current situation, but since CnosDB is not completely satisfied with the features of the Line protocol, we will support storing map, array and other objects, if you have seen the recent release of CnosDB 2.4.0, you will find that we already support the Geometry type, which cannot be represented by Line Protocol, and as CnosDB's support types evolve, we will use a new protocol developed by ourselves, not Line Protocol.

My suggestion is to reuse InfluxDB's logic on LineProtocol and let me implement the replacement when our protocol is developed.

What do you think of that?

:)

@neuronull
Copy link
Contributor

Thanks for those extra details @Subsegment ,

Discussed this some more with the team. We're still open to accepting an InfluxDB codec and going that route. Though it is understandable if you're not interested in pursuing that due to the short duration it would be of use to you. Our motivation for that direction is to make the best decision that can result in the longterm benefit to the future of the project.

But for a new cnosdb sink, based on the current situation and the emerging CnosDB protocol/client that is in the works, we feel the most sensible route would be to wait for a stable version of this client, and then re-evaluate the inclusion of a new cnosdb sink at that time.

@Subsegment
Copy link
Author

Thanks @neuronull

We will discuss this, use your proposal, or develop the CnosDB Client in the near future.

Hope that we can keep in touch.

@neuronull
Copy link
Contributor

👍 Sounds good. We'll keep this issue open, feel free to respond here again as things progress and definitely lets keep in touch. Thanks @Subsegment ~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sink: new A request for a new sink type: feature A value-adding code addition that introduce new functionality.
Projects
None yet
Development

No branches or pull requests

3 participants