Skip to content

[cdc] Optimize SyncDatabaseAction performance by removing listTables calls #5956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

huyuanfeng2018
Copy link
Contributor

Purpose

from #5955

What is the purpose of the change

Optimize SyncDatabaseAction performance by removing expensive listTables operations during initialization, improving scalability for databases with many tables.

Brief change log

  • Remove listTables() call from RichCdcMultiplexRecordEventParser
  • Implement lazy table creation in CdcDynamicTableParsingProcessFunction#processElement
  • Remove createdTables Set to reduce memory usage

Verifying this change

  • Verified existing functionality remains intact

Testing

This optimization does not require additional test cases as the existing functionality is already covered by:

  • SyncDatabaseActionBaseTest.testSyncTablesWithoutDbLists() - validates table filtering logic
  • SyncDatabaseActionBaseTest.testSyncTablesWithDbList() - validates database filtering logic
  • SyncDatabaseActionBaseTest.testSycTablesCrossDB() - validates cross-database filtering scenarios

All these tests create and use RichCdcMultiplexRecordEventParser, ensuring the optimization doesn't break existing functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant