[cdc]Optimize SyncDatabaseActionBase:avoid blocking on listTables operation #6660
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: close #5955
What is the purpose of the change
Avoid blocking on listTables operation
ChangeLog
remove SyncDatabaseActionBase.buildEventParserFactory() catalog.listTables(database). it will list all catalog tables, which will result in a lot of time consumption and cost
Consumes unnecessary memory to maintain the createdTables set
Performs redundant operations when tables are created lazily
Tests
This optimization does not require additional test cases as the existing functionality is already covered by:
SyncDatabaseActionBaseTest.testSyncTablesWithoutDbLists() - validates table filtering logic
SyncDatabaseActionBaseTest.testSyncTablesWithDbList() - validates database filtering logic
SyncDatabaseActionBaseTest.testSycTablesCrossDB() - validates cross-database filtering scenarios
All these tests create and use RichCdcMultiplexRecordEventParser, ensuring the optimization doesn't break existing functionality.
When a table is lazily loaded, it will check for its existence, which will incur additional time consumption. Then waitJobRunning(client) method failure to obtain the Flink task status will result in test case errors. These test cases should add query timeout:
KafkaCanalSyncDatabaseActionITCase.testCaseInsensitive
KafkaOggSyncDatabaseActionITCase.testCaseInsensitive
MySqlSyncDatabaseActionITCase.testNewlyAddedTableSingleTable