-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mysql][polardb-x] Support all charsets for MySQL CDC Connector #1188
Conversation
@fsk119 Please take a look. Thanks. |
Thanks for your contribution. I find the MySqlConnectorITCase#testColumnOptionalWithDefaultValue fails. Could you verify whether the change break the test?
Yes. You should add all of the charsets. |
There a NullPointerException in my implementation. Now it should be fixed. @fsk119 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your great work. I leave some comments.
...c/main/java/com/ververica/cdc/connectors/mysql/debezium/task/MySqlSnapshotSplitReadTask.java
Outdated
Show resolved
Hide resolved
...r-mysql-cdc/src/test/java/com/ververica/cdc/connectors/mysql/table/MySqlConnectorITCase.java
Outdated
Show resolved
Hide resolved
I find that the charset properties are not using in the jdbc connection rightly. This issue should wait #674 to be fixed. After setting the properties rightly, we could pass the new tests that use different charset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for update. I left some comments.
...c/main/java/com/ververica/cdc/connectors/mysql/debezium/task/MySqlSnapshotSplitReadTask.java
Outdated
Show resolved
Hide resolved
...-cdc/src/test/java/com/ververica/cdc/connectors/mysql/table/MysqlConnectorCharsetITCase.java
Outdated
Show resolved
Hide resolved
...-cdc/src/test/java/com/ververica/cdc/connectors/mysql/table/MysqlConnectorCharsetITCase.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ruanhang1993 for the contribution, I left some comments
flink-connector-mysql-cdc/src/test/resources/ddl/charset_test.sql
Outdated
Show resolved
Hide resolved
...-cdc/src/test/java/com/ververica/cdc/connectors/mysql/table/MysqlConnectorCharsetITCase.java
Outdated
Show resolved
Hide resolved
...-cdc/src/test/java/com/ververica/cdc/connectors/mysql/table/MysqlConnectorCharsetITCase.java
Outdated
Show resolved
Hide resolved
...-cdc/src/test/java/com/ververica/cdc/connectors/mysql/table/MysqlConnectorCharsetITCase.java
Outdated
Show resolved
Hide resolved
6724de2
to
348cac0
Compare
@ruanhang1993 Hi Ruan, I pm you some question regarding the details of this PR. I'll copy my question below -- could you also check your email? IMHO, there are two charset transformations happening if the charset used in JDBC connection is different from that of table columns. For instance if the column uses latin1 and jdbc uses utf8, when we read the records the following transformations will occur: So if we directly call getBytes and let the use convert it to latin1, the first utf8 transformation is missing. Therefore we should let jdbc handle charset conversion for us by calling getObject and underlyingly these transformations will happen: Please correct me if there's any misunderstanding. |
I think there is something you misunderstand.
This bug is that if we return a byte[] object which is UTF-8 encoding, the debezium will convert this value to String by a |
@ruanhang1993 I open a new PR #1468 to fix the issue base on your work, please do not use you master branch as a develop branch which others can not append commits. |
This PR fixed the encoding bug when a mysql table uses a different charset encoding in the column level.
Changes:
MySqlValueConverters#charsetFor(Column)
is reused to get the column charset.