-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
codec(ticdc): fix some avro encoding bugs #4704
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add tests to cover changes in avro.go? Thanks!
@@ -312,6 +312,9 @@ func getAvroDataTypeFromColumn(col *model.Column) (interface{}, error) { | |||
case mysql.TypeDouble: | |||
return "double", nil | |||
case mysql.TypeVarchar, mysql.TypeString, mysql.TypeVarString: | |||
if col.Flag.IsBinary() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some unit tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, maybe we need a flag to control the behavior? Like https://debezium.io/documentation/reference/connectors/mysql.html#mysql-property-binary-handling-mode
Cc @leoppro
4b7ec56
to
f2ea25e
Compare
/hold working on more tests |
Codecov Report
Flags with carried forward coverage won't be shown. Click here to find out more. @@ Coverage Diff @@
## master #4704 +/- ##
================================================
- Coverage 55.6402% 55.2570% -0.3832%
================================================
Files 494 518 +24
Lines 61283 63914 +2631
================================================
+ Hits 34098 35317 +1219
- Misses 23750 25113 +1363
- Partials 3435 3484 +49 |
012c834
to
a591ccb
Compare
/unhold |
/run-all-tests |
@@ -253,7 +254,8 @@ type RowChangedEvent struct { | |||
|
|||
RowID int64 `json:"row-id" msg:"-"` // Deprecated. It is empty when the RowID comes from clustered index table. | |||
|
|||
Table *TableName `json:"table" msg:"table"` | |||
Table *TableName `json:"table" msg:"table"` | |||
ColInfos []rowcodec.ColInfo `json:"column-infos" msg:"-"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use column_infos
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I don't really get it. Do you mean change the json:"column-infos"
to json:"column_infos"
? But all other fields use hyphens instead of underscores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
msg:"-"
seems to prevent the information from being persisted for Redo. Could you verify the compatibility of this change with Redo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redo only applies to mysql sink currently so does none of avro business. So prevent ColInfos
from being persisted for Redo seems right.
But I think the ColInfos
is important and maybe in future Redo could support avro? Persist ColInfos
seems bring no trouble. Upgrade is also safe since a nil
ColInfos
field is safe for other codecs. But as @liuzix tells me, it will make the redo log large. So I'd like to leave it unpersist in this pull request.
Could you take a look @ben1009 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm for redo part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to cp it to 5.4, it looks like you only marked 5.3?
Please write a more detailed description of your PRs, how they work and why they were changed. Thanks!
Also, please add the release note.
8acc74f
to
14b7490
Compare
/run-all-tests |
All the three issues are marked |
/run-all-tests |
/run-kafka-integration-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
for _, item := range testColumnsTable { | ||
obtainedMySQLType := getMySQLType(item.column) | ||
c.Assert(obtainedMySQLType, check.Equals, item.expectedMySQLType) | ||
|
||
obtainedJavaSQLType, err := getJavaSQLType(item.column, obtainedMySQLType) | ||
c.Assert(err, check.IsNil) | ||
c.Assert(obtainedJavaSQLType, check.Equals, item.expectedJavaSQLType) | ||
|
||
if !item.column.Flag.IsBinary() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why only for binary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For non-binary. Because the expectedValue
is dummy for binary: https://github.com/pingcap/tiflow/blob/master/cdc/sink/codec/canal_test.go#L424.
}, time.Local) | ||
} | ||
|
||
colInfos := []rowcodec.ColInfo{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can organize this data by structured arrays? Controlling correspondence by index may be difficult with large test data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Actually using arrays restricts the usage pattern. We have to iterate the array(currently no problem). I tried to use the tp map, but it can't be used since the column Id
is lost in Column
struct. Add fields to Column
will cause much code churn. Any suggestions?
/run-all-tests |
/run-verify |
It's confirmed that we could just follow our conventional workflow and don't need to cherry-pick. |
/remove-label needs-cherry-pick-release-5.3 |
/merge |
This pull request has been accepted and is ready to merge. Commit hash: b9b05cb
|
What problem does this PR solve?
Issue Number: close #4454, close #4453, close #4420
What is changed and how it works?
To resolve the issues, we need to change the codec data mapping for enum/set(->string) and string/binary. This requires knowing table definition information when performing codec. So I carry the information in
RowChangedEvent
.Release note