-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Glue Schema Registry: Throttling exception when trying to register schema version for multiple tables #11045
Comments
Hey @asddongmen, since you implemented the Glue integration, would you mind sharing some insights on this issue? Maybe there is a workaround? Thank you! |
Some logs |
It appears that this issue is due to all the tables sinking to a single Kafka topic, causing them to share a schemaName. However, these tables have different
tableVersion values differ. So a new schema will be update to schemaRegistry.
A possible workaround is to sink different tables to different topics. |
If there are multiple tables, and they were dispatched to the same topic, so the schema key is the same. The encoder group runs 32 encoders concurrently, each one may fetch the schema independently, this may cause the issue. |
/severity moderate |
@phuongvu Could you please provide the changefeed's config to help us investigate further? |
Thank you guys for the insight! It makes sense that my setup causes this issue :( I'll go with the workaround! Here is my config as requested:
With the
|
It might not be related but I try to sink all the tables to a single Kafka topic using
Interestingly, the error message
Some logs:
|
May I ask, did the workaround for Avro protocol work? Regarding your question about the cana-json protocol, it shouldn't be impacted by this issue. I have not been able to reproduce it in my local environment. If possible, providing the complete ticdc log could assist us in further investigating this matter. |
Hey @asddongmen, we sort of put the project that gonna use TiCDC on hold right now so I haven't had the chance to try to try the workaround yet but I think it should work. Re: log for canal-json, I can try the find the log and post it here. |
What did you do?
We created:
Whenever the tables are created, changefeed task would try to register the schema version at the same time for all the tables and that causes rate limited errors.
What did you expect to see?
No concurrently update the schema version for the same schema
https://github.com/pingcap/tiflow/blob/master/pkg/sink/codec/avro/avro.go#L154
What did you see instead?
operation error Glue: RegisterSchemaVersion, exceeded maximum number of attempts, 3, https response error StatusCode: 400, RequestID: c85a2ee3-b98b-459e-a621-d58a89b045bd, api error ThrottlingException: Rate exceeded
operation error Glue: RegisterSchemaVersion, https response error StatusCode: 400, RequestID: 45b24c85-eb3c-4936-bc37-4e46f18efd61, ConcurrentModificationException: Some other operation happened for the schema, please retry again. SchemaName: staging_cdc_v0-key, RegistryName: staging_cdc, SchemaArn: arn:aws:glue:us-east-1:131234521375:schema/staging_cdc/staging_cdc_v0-key
Pause and resume changefeed sometimes fixes this but I was hoping we can handle this within ticdc.
Versions of the cluster
Upstream TiDB cluster version (execute
SELECT tidb_version();
in a MySQL client):Release Version: v7.5.1¶Edition: Community¶Git Commit Hash: 7d16cc79e81bbf573124df3fd9351c26963f3e70¶Git Branch: heads/refs/tags/v7.5.1¶UTC Build
Upstream TiKV version (execute
tikv-server --version
):TiCDC version (execute
cdc version
):The text was updated successfully, but these errors were encountered: