Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic update of the altinity_sink_connector.replica_source_info #326

Closed
aadant opened this issue Oct 20, 2023 · 5 comments
Closed

Atomic update of the altinity_sink_connector.replica_source_info #326

aadant opened this issue Oct 20, 2023 · 5 comments
Assignees
Labels
bug Something isn't working GA-1 All the issues that are issues in release(Scheduled Dec 2023) high-priority qa-verified label to mark issues that were verified by QA

Comments

@aadant
Copy link
Collaborator

aadant commented Oct 20, 2023

:) select * from altinity_sink_connector.replica_source_info;

SELECT *
FROM altinity_sink_connector.replica_source_info

Query id: f9ef64b2-cf4b-4e31-b98f-3c8fe8c3f9f7

Ok.

0 rows in set. Elapsed: 0.003 sec.

This table should not be empty when the sink connector is running. I guess it comes from a delete without a corresponding insert.

@subkanthi
Copy link
Collaborator

Related to #347

@aadant aadant added the bug Something isn't working label Oct 26, 2023
@subkanthi
Copy link
Collaborator

subkanthi commented Dec 8, 2023

There are some restrictions in setting the binlog position which are discussed here - https://issues.redhat.com/browse/DBZ-3829 especially with starting in schema_only mode.

The restriction
I can start from a previous offset but if there was a ddl event between the previous offset and the current binlog position, I get an error because the schema that is captured in dbhistory file is current schema and not the schema at the previous offset.

MySqlStreamingChangeEventSource.informAboutUnknownTableIfRequired: Encountered change event {<event>} at offset {<supplied_offset>} for table {xyz} whose schema isn't known to this connector. One possible cause is an incomplete database history topic. Take a new snapshot in this case

@subkanthi subkanthi added the GA-1 All the issues that are issues in release(Scheduled Dec 2023) label Dec 18, 2023
@subkanthi
Copy link
Collaborator

When testing offset management, noticed that the table appears empty.

ef69f9530b51 :) select query from system.query_log where databases in ['altinity_sink_connector'] order by event_time desc limit 5;

SELECT query
FROM system.query_log
WHERE databases IN ['altinity_sink_connector']
ORDER BY event_time DESC
LIMIT 5

Query id: d84beb84-8b13-4157-9e12-d93ebfa4b263

┌─query───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ select * from replica_source_info;                                                                                                  │
│ select * from replica_source_info;                                                                                                  │
│ SELECT id, offset_key, offset_val, record_insert_ts, record_insert_seq FROM `altinity_sink_connector`.`replica_source_info` WHERE 0 │
│ SELECT id, offset_key, offset_val, record_insert_ts, record_insert_seq FROM `altinity_sink_connector`.`replica_source_info` WHERE 0 │
│ ALTER TABLE `altinity_sink_connector`.`replica_source_info` DELETE where 1=1 SETTINGS mutations_sync=1                              │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

5 rows in set. Elapsed: 0.009 sec. Processed 84.46 thousand rows, 17.25 MB (9.58 million rows/s., 1.96 GB/s.)

@subkanthi
Copy link
Collaborator

Configuration changes:


# offset.storage.jdbc.offset.table.ddl: The DDL statement used to create the database table where connector offsets are to be stored.(Advanced)
offset.storage.jdbc.offset.table.ddl: "CREATE TABLE if not exists %s
(
    `id` String,
    `offset_key` String,
    `offset_val` String,
    `record_insert_ts` DateTime,
    `record_insert_seq` UInt64,
    `_version` UInt64 MATERIALIZED toUnixTimestamp64Nano(now64(9))
)
ENGINE = ReplacingMergeTree(_version) ORDER BY offset_key SETTINGS index_granularity = 8198"

# offset.storage.jdbc.offset.table.delete: The DML statement used to delete the database table where connector offsets are to be stored.(Advanced)
offset.storage.jdbc.offset.table.delete: "select * from %s"

offset.storage.jdbc.offset.table.select: "SELECT id, offset_key, offset_val FROM %s FINAL ORDER BY record_insert_ts, record_insert_seq"

@Selfeer
Copy link
Collaborator

Selfeer commented Feb 15, 2024

The issue was manually verified by the Altinity QA team and marked as qa-verified.

Build used for testing: altinityinfra/clickhouse-sink-connector:443-678d8ae2567c2c1d89f5430fa57ede619d6ea851-lt

We've verified the following scenario:

Creating tables with data on the MySQL and performing an action which kills the clickhouse-sink-connector process (actions like overloading the MySQL server with huge amount of data or kill the container that runs the program) as a results does not empty the replica_source_info table.

To check the values of the table, the following query was executed in ClickHouse.

select * from altinity_sink_connector.replica_source_info;

@Selfeer Selfeer added the qa-verified label to mark issues that were verified by QA label Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working GA-1 All the issues that are issues in release(Scheduled Dec 2023) high-priority qa-verified label to mark issues that were verified by QA
Projects
None yet
Development

No branches or pull requests

3 participants