Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sink Connector Light for MySQL falls behind after a while #598

Closed
meysammeisam opened this issue May 16, 2024 · 4 comments
Closed

Sink Connector Light for MySQL falls behind after a while #598

meysammeisam opened this issue May 16, 2024 · 4 comments

Comments

@meysammeisam
Copy link

We use the light version to replicate our RDS Mysql data to CH. In most cases, it works fine, but for some cases in which the source DB is relatively large with a high write rate, it falls behind after a few hours. We have tried these but none seems to be a proper solution:

  • OPTIMIZE TABLE <ABC> FINAL on CH
  • Restarting the Sink Connector

The config file looks like:

# clickhouse-sink-connector
database.hostname: "mysql.rds.amazonaws.com"
database.port: "3306"
database.user: "USER"
database.password: "PASS"
database.server.name: "SINK_TABLE_ABC"
database.server.id: 1014065039
database.include.list: "db"
table.include.list: "db.ABC"
clickhouse.server.url: "127.0.0.1"
clickhouse.server.user: "USER2"
clickhouse.server.pass: "PASS2"
clickhouse.server.port: "8123"
clickhouse.server.database: "db"
database.allowPublicKeyRetrieval: "true"
snapshot.mode: "schema_only"
slot.name: "ch_sink_1"
connector.class: "io.debezium.connector.mysql.MySqlConnector"
offset.flush.interval.ms: 10000
offset.storage: "io.debezium.storage.jdbc.offset.JdbcOffsetBackingStore"
offset.storage.offset.storage.jdbc.offset.table.name: "altinity_sink_connector.db_ABC_replica_source_info"
offset.storage.jdbc.url: "jdbc:clickhouse://127.0.0.1:8123/altinity_sink_connector"
offset.storage.jdbc.user: "USER2"
offset.storage.jdbc.password: "PASS2"
offset.storage.offset.storage.jdbc.offset.table.ddl: "CREATE TABLE if not exists %s
(
    `id` String,
    `offset_key` String,
    `offset_val` String,
    `record_insert_ts` DateTime,
    `record_insert_seq` UInt64,
    `_version` UInt64 MATERIALIZED toUnixTimestamp64Nano(now64(9))
)
ENGINE = ReplacingMergeTree(_version)
ORDER BY id
SETTINGS index_granularity = 8198"
offset.storage.offset.storage.jdbc.offset.table.delete: "delete from %s where 1=1"
schema.history.internal: "io.debezium.storage.jdbc.history.JdbcSchemaHistory"
schema.history.internal.jdbc.url: "jdbc:clickhouse://127.0.0.1:8123/altinity_sink_connector"
schema.history.internal.jdbc.user: "USER2"
schema.history.internal.jdbc.password: "PASS2"
schema.history.internal.jdbc.schema.history.table.ddl: "CREATE TABLE if not exists %s
(`id` VARCHAR(36) NOT NULL, `history_data` VARCHAR(65000), `history_data_seq` INTEGER, `record_insert_ts` TIMESTAMP NOT NULL, `record_insert_seq` INTEGER NOT NULL) ENGINE=ReplacingMergeTree(record_insert_seq) order by id"

schema.history.internal.jdbc.schema.history.table.name: "altinity_sink_connector.db_ABC_replicate_schema_history"
enable.snapshot.ddl: "false"
auto.create.tables: "false"
metrics.enable: "true"
metrics.port: "18107"
cli.port: 7001
thread.pool.size: 50

As it's captured by the Prometheus/Grafana dashboard, The lag(Source DB -> CH Lag) is zero for a while and then it starts falling behind, and no service logs are there. So it seems it stops working fully until we restart or reinitiate it

Screenshot 2024-05-16 at 15 27 29 Screenshot 2024-05-16 at 15 29 04 Screenshot 2024-05-16 at 15 29 28 Screenshot 2024-05-16 at 15 29 41

Are we missing anything in our configuration? Any hints are highly appreciated.

Notes:

  • The source DB has several tenths of tables, for the DBs with high change rates, we have separate Sink Connectors.
  • We have no issues with other 10-20 Mysql DBs with smaller data and lower data change rates.
@aadant aadant changed the title Sink Connector Light for Mysql falls behind after a while Sink Connector Light for MySQL falls behind after a while May 16, 2024
@aadant
Copy link
Collaborator

aadant commented May 16, 2024

@meysammeisam please try those values

#574 (comment)

with CH, it is best to use less insert threads doing more work :)

@aadant
Copy link
Collaborator

aadant commented May 16, 2024

Also one sink-connector should be able to handle multiple database. #531

@meysammeisam
Copy link
Author

@meysammeisam please try those values

#574 (comment)

with CH, it is best to use less insert threads doing more work :)

@aadant Unfortunately it didn't help!
But I was able to identify the issue. The change rate on my source DB is so high, and Altinity flushes the batches fast enough, in a couple of seconds usually, but then, it doesn't move immediately to the next binlog file, and for some reason waits for 5 minutes(exactly 5 minutes), then executing it, and does it fast enough, but still waits for another 5 minutes.
The issue is that every five minutes a couple of new binlogs are already created on the source DB, and that's why it falls behind immediately!
Here I can share the logs:

May 22 16:03:12 ch altinity-sync[933389]: 2024-05-22 16:03:12.236 INFO  - *************** EXECUTED BATCH Successfully Records: 212************** task(0) Thread ID: Sink Connector thread-pool-3 Result: [I@2acf6b76
May 22 16:03:12 ch altinity-sync[933389]: [JdbcOffsetBackingStore-1] WARN com.clickhouse.jdbc.internal.ClickHouseConnectionImpl - [JDBC Compliant Mode] Transaction is not supported. You may change jdbcCompliant to false to throw SQLException instead.
May 22 16:03:12 ch altinity-sync[933389]: [JdbcOffsetBackingStore-1] WARN com.clickhouse.jdbc.internal.ClickHouseConnectionImpl - [JDBC Compliant Mode] Transaction [9635f6b5-3f74-4a72-a72b-7b9d9b16ab85] (4 queries & 0 savepoints) is committed.
May 22 16:03:12 ch altinity-sync[933389]: 2024-05-22 16:03:12.264 INFO  - ***** BATCH marked as processed to debezium ****Binlog file:mysql-bin-changelog.030965 Binlog position: 2413173 GTID: 2144281366 Sequence Number: 1716393793068007070 Debezium Timestamp: 1716393792068

Any hints on how to debug and see what's the bottleneck is highly appreciated.

@meysammeisam
Copy link
Author

We managed to fix the issue by reducing the upsert rate on the source DB, so it's now producing less binary logs and Alitiny Sink Light, is able to catch up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants