Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable kafka topic names. #1840

Closed
lcysimon opened this issue Aug 31, 2020 · 14 comments · Fixed by #1894
Closed

Configurable kafka topic names. #1840

lcysimon opened this issue Aug 31, 2020 · 14 comments · Fixed by #1894
Assignees
Labels
feature-request Request for a new feature to be added

Comments

@lcysimon
Copy link

Is your feature request related to a problem? Please describe.
Due to company policies, all my kafka topics need to follow some conventions.
As kafka topics are non-configurable hardcoded string in datahub, this prevent me from using this ingestion without building it on my own.

Describe the solution you'd like
I would like to be able to configure those with springBootAutoconfigure, which would allow us to have them:

  • as part of the application.yml
  • being able to override them in environment variables, really useful when using docker.
  • have default value if in none of the above.

Describe alternatives you've considered
No alternative considered. Open to suggestion as I am not a spring boot specialist of course.

@lcysimon lcysimon added the feature-request Request for a new feature to be added label Aug 31, 2020
@jplaisted
Copy link
Contributor

fwiw the MCE/MAE jobs are configurable today with KAFKA_TOPIC_NAME. But yes, looks like the KafkaMetadataEventProducer is not configurable.

fyi for the future, we're actually planning on switching up our event model quite a bit.

https://github.com/linkedin/datahub/blob/master/docs/roadmap.md#aspect-specific-mce--mae

To clarify that one sentence further, we've found that the monolithic events aren't scaling. So right now we only have MetadataChangeEvent, which contains all possible entity snapshots; which in turn have all aspects of the entity. What we want to do is actually make one event per aspect-entity pair. i.e. Ownership-Dataset, DatasetInfo-Dataset, CorpUserInfo-CorpUser, etc. We plan to auto generate these based on PDL properties on models.

In order for this to scale well, we also need a convention for this, so jobs that want to listen to all events can easily construct a pattern to do so.

cc @RealChrisL

@shakti-garg-saxo
Copy link
Contributor

@jplaisted This feature as per roadmap will be available in medium term(3-6 months). Is there a way to prioritise this change?
can there be a way to do this change without impacting aspect-specific modelling feature?

In my organisation also, we have started feeling heat as new topic naming policy is in works and as soon as it is implemented, we will be needing this fix.

Thanks.

@jplaisted
Copy link
Contributor

jplaisted commented Sep 1, 2020

@mars-lan for prioritization

@shakti-garg-saxo @lysimon I'm curious as to how strict these (potential) naming policies are at your companies, and what the philosophy behind having a naming convention across the company on topic names is? Googling around it seems like it may be an underscore _ vs period . debate, since they can collide in metric names? Is that the case here, and your companies want to go with .?

https://devshawn.com/blog/apache-kafka-topic-naming-conventions/#:~:text=When%20it%20comes%20to%20naming,%2C%20and%20hyphens%20(%2D).

EDIT: Just to clarify, A) I'm curious and B) if it is as simple as switching _ to ., we can probably make that a very simple config (i.e. boolean flag rather than manually changing every single topic name manally).

@mars-lan
Copy link
Contributor

mars-lan commented Sep 1, 2020

@jplaisted This feature as per roadmap will be available in medium term(3-6 months). Is there a way to prioritise this change?
can there be a way to do this change without impacting aspect-specific modelling feature?

In my organisation also, we have started feeling heat as new topic naming policy is in works and as soon as it is implemented, we will be needing this fix.

Thanks.

@shakti-garg-saxo When you say prioritize, do you mean the "aspect-specific MXE" or "ability to rename current MXE topic name"? Also what you mean by "without impacting aspect-specific modelling feature"?

@jplaisted
Copy link
Contributor

I don't think it is too much work to make KafkaMetadataEventProducer configurable. Assuming your topics really do follow a nice convention (i.e. as simple as changing _ to .)

@shakti-garg-saxo
Copy link
Contributor

@mars-lan @jplaisted

  1. For us, it is not a simple '_' vs '.' . As Kafka is exposed as a service in a multi-tenant self-service architecture, it follows a naming policy like "([user-name]|[env-name])type-data-asset-stage", which each variable has to be replaced by a contextual value of topic being created.
  2. I am asking to prioritize the "ability to rename/parameterize current MXE topic name".
  3. When I said "without impacting aspect-specific modelling feature", It was based on my understanding from the discussion that MXE topic-names will also change as part of this new feature. So, was asking if it is able to parameterize the topic-names without conflicts with "aspect-specific modelling feature".

Thanks

@jplaisted
Copy link
Contributor

jplaisted commented Sep 2, 2020

For us, it is not a simple '_' vs '.' . As Kafka is exposed as a service in a multi-tenant self-service architecture, it follows a naming policy like "([user-name]|[env-name])type-data-asset-stage", which each variable has to be replaced by a contextual value of topic being created.

That may require a ittle more work on your end. I think we can easily open up KafkaMetadataEventProdcuer to accept some new TopicConvention class, with a default StandardTopicConvention that uses MetadataChangeEvent_V4 by default, and can also easily change the _ delimiter to something else. But if your use case is more complicated you'll have to make a custom implementation, which is still hopefully simple.

Example PR (I have to submit this internally rather than a PR, I think): #1843

I am asking to prioritize the "ability to rename/parameterize current MXE topic name".

I think that is reasonable. We can punt on the "next generation" of MXEs for now, since they aren't fully open sourced (or even working internally, fully, yet).

So, was asking if it is able to parameterize the topic-names without conflicts with "aspect-specific modelling feature".

I think we can encapsulate this a similar manner as to the old events, via the TopicConvention. See PR, the new interface has methods for the new aspect events, but I made them throw / didn't use them for now. We can implement them later.

@lcysimon
Copy link
Author

lcysimon commented Sep 2, 2020

Our convention would be similar to the one mentioned by @shakti-garg-saxo .
IE,

country-domain-teamname-product-{whichever}

The temporary solution has been to update the topics on my own and rebuild the docker container, which was indeed quite easy.

@jplaisted jplaisted self-assigned this Sep 2, 2020
@shakti-garg-saxo
Copy link
Contributor

shakti-garg-saxo commented Sep 3, 2020

For us, it is not a simple '_' vs '.' . As Kafka is exposed as a service in a multi-tenant self-service architecture, it follows a naming policy like "([user-name]|[env-name])type-data-asset-stage", which each variable has to be replaced by a contextual value of topic being created.

That may require a ittle more work on your end. I think we can easily open up KafkaMetadataEventProdcuer to accept some new TopicConvention class, with a default StandardTopicConvention that uses MetadataChangeEvent_V4 by default, and can also easily change the _ delimiter to something else. But if your use case is more complicated you'll have to make a custom implementation, which is still hopefully simple.

Example PR (I have to submit this internally rather than a PR, I think): #1843

@jplaisted Thanks for response. I had a quick look at the example PR. Couple of queries:

  1. I understood that you are trying to version the topics by adding "v4" suffixes and this is related to PR(metadata-models 72.0.8 -> 80.0.0 #1756). Can you help me with the reason behind this versioning of topics? I am unable to decipher it.
  2. In the TopicConvention implementation class, is it still possible to parameterise values of constants METADATA_CHANGE_EVENT,METADATA_AUDIT_EVENT and FAILED_METADATA_CHANGE_EVENT using Spring? This way, we can allow users to customise topic names without impacting the internal versioning scheme.
public final class StandardTopicConvention implements TopicConvention {
   private static final String DEFAULT_DELIMiTER = "_";
   private static final String METADATA_CHANGE_EVENT = "MetadataChangeEvent";
   private static final String METADATA_AUDIT_EVENT = "MetadataAuditEvent";
   private static final String FAILED_METADATA_CHANGE_EVENT = "FailedMetadataChangeEvent";
   private static final String VERSION_4 = "v4";

@jplaisted
Copy link
Contributor

I understood that you are trying to version the topics by adding "v4" suffixes and this is related to PR(#1756). Can you help me with the reason behind this versioning of topics? I am unable to decipher it.

It is parity with what we have today (the topic is hardcoded to MetadataChangeEvent_v4 today).

Historically, before we open source, there was MetadataChangeEvent and MetadataChangeEvent_v2 (v3 never shipped and immediately turned into v4, iirc :p ). We are referring to that aspect specific event that I mentioned earlier as "v5". It won't have "v5" in the name, but it is the successor to MetadataChangeEvent_v4.

Backwards incompatible changes are not allowed in kafka, and these changes were backwards incompatible with the previous version. So we had to migrate to new topics, at the time.

In the TopicConvention implementation class, is it still possible to parameterise values of constants [...] using Spring?

I could look into doing that instead, sure.

@jplaisted
Copy link
Contributor

jplaisted commented Sep 3, 2020

Actually, I think that may not scale well. It may be better for you just to write your own implementation and factories?

When we go to "v5", every aspect will have its own event. You probably don't want to explicitly list them all out, but instead have some java to build it, given the aspect?

Edit: But I guess we can still punt on that.

jplaisted pushed a commit that referenced this issue Sep 24, 2020
Requested by a few people in OS. See #1840.

Companies need full customization over the topic name. This new class should be easily customizable by using a spring factory.

TODO to finish the implmentation for v5. For right now v5 is LI only and unfinished. Getting this in for v4 so it is useful to other companies now.

TODO AFTER OPEN SOURCE PUSH - make configurable via spring
TODO AFTER SUBMIT - see where else we can use this (jobs, where else?)
@jplaisted
Copy link
Contributor

My base change to the code is now in (sorry about that delay). I'll work next on wiring it up with spring.

jplaisted pushed a commit that referenced this issue Sep 29, 2020
)

New env vars:

- METADATA_CHANGE_EVENT_NAME: The name of the metadata change event topic.
- METADATA_AUDIT_EVENT_NAME: The name of the metadata audit event topic.
- FAILED_METADATA_CHANGE_EVENT_NAME: The name of the failed metadata change event topic.

This will need to be consistent throughout your ecosystem.

CLOSES: #1840
@jplaisted
Copy link
Contributor

This should now be configurable for our applications and Java based ingestion examples (we're still porting a few over from python). See latest commit description:

New env vars:

  • METADATA_CHANGE_EVENT_NAME: The name of the metadata change event topic.
  • METADATA_AUDIT_EVENT_NAME: The name of the metadata audit event topic.
  • FAILED_METADATA_CHANGE_EVENT_NAME: The name of the failed metadata change event topic.

This will need to be consistent throughout your ecosystem.

@jplaisted
Copy link
Contributor

Documentation also updated in #1909

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Request for a new feature to be added
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants