Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka Connect: Add config to route to tables using topic name #11313

Closed
wants to merge 1 commit into from

Conversation

xiasongh
Copy link

Add a new config iceberg.tables.route-pattern to dynamically route to Iceberg tables using Kafka topic name

Closes #11163

}

// replace topic namespace separator
return routePattern.replace("{topic}", topicName.replace(".", "_"));
Copy link
Author

@xiasongh xiasongh Oct 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

topicName.replace(".", "_")

I use AWS Glue catalog, which doesn't support nested namespaces. Topic names with more than 1 . would be invalid table names, so one thing we can do is replace all the . characters

The Debezium JDBC sink connector [0] also does this, so it's not totally unheard of

It probably makes most sense to turn this into it's own config, maybe something like iceberg.tables.route-pattern.namespace-separator? Thoughts?

[0] https://debezium.io/documentation/reference/stable/connectors/jdbc.html#jdbc-property-table-naming-strategy

@@ -63,6 +63,7 @@ for exactly-once semantics. This requires Kafka 2.5 or later.
| iceberg.tables | Comma-separated list of destination tables |
| iceberg.tables.dynamic-enabled | Set to `true` to route to a table specified in `routeField` instead of using `routeRegex`, default is `false` |
| iceberg.tables.route-field | For multi-table fan-out, the name of the field used to route records to tables |
| iceberg.tables.route-pattern | For topic-based routing, the format string used to route records to tables, e.g. `db.{topic}`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should reuse iceberg.tables.route-field and have that support the expression syntax.

`iceberg.tables.dynamic-enabled` is `true` then you must specify `iceberg.tables.route-field` which will
contain the name of the table.
`iceberg.tables.dynamic-enabled` is `true` then you must specify either `iceberg.tables.route-field` which will
contain the name of the table or `iceberg.tables.route-pattern` which specifies a format string for the table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with a single property we can revert this

@bryanck
Copy link
Contributor

bryanck commented Oct 16, 2024

Thanks for the PR @xiasongh ! I'm interested in your thoughts on using a single property to define routing. If you agree, I can re-review once you make that change.

@xiasongh
Copy link
Author

xiasongh commented Oct 21, 2024

Thanks for the PR @xiasongh ! I'm interested in your thoughts on using a single property to define routing. If you agree, I can re-review once you make that change.

Hi @bryanck, thanks for the review!

I'm a little apprehensive about re-using iceberg.tables.route-field since the semantics would be inconsistent.

iceberg.tables.route-field is used to point at a field in the record and use its value as the target iceberg table, whereas iceberg.tables.route-pattern is used as a format string that results in the table. I think mixing the two as it is now would be a bit confusing.

For example, if we had iceberg.tables.route-field = "db.{topic}", does that mean the target table is the value of the db.{topic} field or is the target table db.{topic}? Is the rule: if there's a format string, then it's the target table, otherwise it's a field that contains the target table?

How about we introduce iceberg.tables.route-pattern and support adding fields from the record using the expression syntax? For example, iceberg.tables.route-pattern="{field}" would be equivalent to iceberg.tables.route-field="field", but something like iceberg.tables.route-pattern="db.{_topic}_{field}" would also be supported. Then we can deprecate the original field. Thoughts?

@mun1r0b0t
Copy link

Hi folks, is this still being worked on?

I have a similar need to route by topic, but the changes in this PR will not address my needs. I need to configure the topic for individual tables and cannot use the topic name as the table name. I would like to submit a separate PR for your review.

@mun1r0b0t
Copy link

I opened up #11163 since I did not hear back on this.

Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Dec 23, 2024
@xiasongh
Copy link
Author

Hey @bryanck @mun1r0b0t!

Sorry! I've been a bit busy and this PR got lost.

I can finish up this PR, but I'd like to get some thoughts on my previous comment so I can make sure I'm making a change we agree on.

To give context since it's been a while, I'd like to keep the new dynamic routing config route-pattern distinct as I believe it's semantically different than route-field.

I'm fine with putting the functionality into route-field, but I would like some clarification on what the semantics should be.

I give some more details in this comment: #11313 (comment)

Thanks!

@github-actions github-actions bot removed the stale label Dec 30, 2024
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jan 30, 2025
Copy link

github-actions bot commented Feb 6, 2025

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kafka Connect: route to table using topic name
3 participants