Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Param source timezone (review) #374

Closed
wants to merge 10 commits into from

Conversation

aadant
Copy link
Collaborator

@aadant aadant commented Nov 13, 2023

Looks good to me apart from a clash with #366. I managed to replicate to UTC with this PR.

IlyaTsoi and others added 10 commits October 24, 2023 23:49
It would be better if it didn't depend on the hard-coded <server>.<database>.<table> template
…rse a time_string with an offset marker e.g. +03:00, -0130, Z.

Also we can handle an exception when a dateTime string has invalid year e.g. in MSSQL "0000-00-01 00:00:00". Zero year value is invalid in java.
Copy link
Collaborator Author

@aadant aadant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation works for UTC but does not for other timezone if the maximum range is reached.

the solution is to let CH convert according the target data type.

I wonder if this works for say Chicago to Chicago instead of UTC to UTC.
The source.datetime.timezone should not be defaulted to UTC (if not specified) but the debezium source timezone.

if (occurrence.contains("DateTime64")) {
String[] params = StringUtils.substringBetween(occurrence, "(", ")").split(",");
String precision = params[0].trim();
result.append(String.format("DateTime64(%s,\\'%s\\')", precision, timeZone));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

escape clashing with another escape

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in #366

String dataType = ClickHouseUtils.escape(columnNameToDataTypeMap.get(entry.getKey()), '\'');

@@ -141,6 +147,29 @@ public ResultSet executeQueryWithResultSet(String sql) throws SQLException {

}

public String addTimeZoneToColumnDefinition(String typeName, String timeZone) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure it is a good idea to use a regexp here as the target DateTime64 may already have a TZ !

example DateTime64(6,'UTC') becomes
DateTime64(6,'America/Chicago')

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it works if you use a String instead, CH will do the conversion

Exception I got with the maximum boundaries

`db_to` DateTime64(6,\'America/Chicago\'),`_version` UInt64,`is_deleted` UInt8')
118741 2023-11-13 02:53:56.165 [pool-1-thread-9] ERROR com.altinity.clickhouse.sink.connector.db.DbWriter  - ******* ERROR inserting Batch *****************
java.lang.IllegalArgumentException: DateTime(9904571999) should be between -1420070400 and 9904550399 inclusive of both values
	at com.clickhouse.data.ClickHouseChecker.newException(ClickHouseChecker.java:17)
	at com.clickhouse.data.ClickHouseChecker.between(ClickHouseChecker.java:109)
	at com.clickhouse.data.format.BinaryDataProcessor$DateTime64SerDe.serialize(BinaryDataProcessor.java:292)
	at com.clickhouse.data.ClickHouseDataProcessor.write(ClickHouseDataProcessor.java:534)
	at com.clickhouse.jdbc.internal.InputBasedPreparedStatement.addBatch(InputBasedPreparedStatement.java:345)
	at com.altinity.clickhouse.sink.connector.db.DbWriter.addToPreparedStatementBatch(DbWriter.java:479)
	at com.altinity.clickhouse.sink.connector.executor.ClickHouseBatchRunnable.flushRecordsToClickHouse(ClickHouseBatchRunnable.java:199)
	at com.altinity.clickhouse.sink.connector.executor.ClickHouseBatchRunnable.processRecordsByTopic(ClickHouseBatchRunnable.java:169)
	at com.altinity.clickhouse.sink.connector.executor.ClickHouseBatchRunnable.run(ClickHouseBatchRunnable.java:101)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite logical. We can take into account source.timezone for checkIfDateExceedsSupportedRange() methods.

Copy link
Collaborator Author

@aadant aadant Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that works. @subkanthi FYI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure it is a good idea to use a regexp here as the target DateTime64 may already have a TZ !

example DateTime64(6,'UTC') becomes DateTime64(6,'America/Chicago')

Could you explain, please? I don't see any problem here (except possible slowness)). I specifically override it within input() function to make Clickhouse use defined source.timezone to parse incoming data instead of column metadata.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the wrong place to change the schema. The sink-connector should create the schema with the correct timezone (target timezone).

Copy link
Collaborator Author

@aadant aadant Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the sink-connector, there are 2 places we fix the schema

  • for the initial snapshot via custom tools like https://github.com/Altinity/clickhouse-sink-connector/pull/375/files
  • via debezium, the sink-connector needs to add this clickhouse.datetime.timezone parameter. In your case, it would be the same as the source. default should be not specified and defaults to database.connectionTimeZone (debezium)
    source.datetime.timezone should also default to database.connectionTimeZone

what do you think ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in terms of timing, let us wait for @subkanthi to merge all pending PRs to develop, we will then address this one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@subkanthi let us implement this too #349

.define(
ClickHouseSinkConnectorConfigVariables.SOURCE_DATETIME_TIMEZONE.toString(),
Type.STRING,
"UTC",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure it is a good idea. leave it null by default and initialize with source timezone from Debezium.
Add to the default config.yaml (commented)

@subkanthi
Copy link
Collaborator

Thanks @IlyaTsoi for your contribution.

@subkanthi subkanthi closed this Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants