Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codec(ticdc): fix some avro encoding bugs #4704

Merged
merged 10 commits into from
Mar 7, 2022

Conversation

zhangyangyu
Copy link
Member

@zhangyangyu zhangyangyu commented Feb 25, 2022

What problem does this PR solve?

Issue Number: close #4454, close #4453, close #4420

What is changed and how it works?

To resolve the issues, we need to change the codec data mapping for enum/set(->string) and string/binary. This requires knowing table definition information when performing codec. So I carry the information in RowChangedEvent.

Release note

Fix the wrong data mapping for avro codec of type Enum/Set and TinyText/MediumText/Text/LongText.

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Feb 25, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • liuzix
  • overvenus

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. needs-cherry-pick-release-5.3 Should cherry pick this PR to release-5.3 branch. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 25, 2022
Copy link
Member

@overvenus overvenus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add tests to cover changes in avro.go? Thanks!

cdc/entry/mounter.go Outdated Show resolved Hide resolved
cdc/sink/codec/avro.go Outdated Show resolved Hide resolved
@@ -312,6 +312,9 @@ func getAvroDataTypeFromColumn(col *model.Column) (interface{}, error) {
case mysql.TypeDouble:
return "double", nil
case mysql.TypeVarchar, mysql.TypeString, mysql.TypeVarString:
if col.Flag.IsBinary() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some unit tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangyangyu
Copy link
Member Author

/hold

working on more tests

@ti-chi-bot ti-chi-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 25, 2022
@codecov-commenter
Copy link

codecov-commenter commented Feb 27, 2022

Codecov Report

Merging #4704 (8acc74f) into master (9607554) will decrease coverage by 0.3831%.
The diff coverage is 52.6048%.

Flag Coverage Δ
cdc 59.9368% <52.6048%> (+0.0145%) ⬆️
dm 51.3419% <ø> (-0.6870%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@@               Coverage Diff                @@
##             master      #4704        +/-   ##
================================================
- Coverage   55.6402%   55.2570%   -0.3832%     
================================================
  Files           494        518        +24     
  Lines         61283      63914      +2631     
================================================
+ Hits          34098      35317      +1219     
- Misses        23750      25113      +1363     
- Partials       3435       3484        +49     

@ti-chi-bot ti-chi-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 27, 2022
@zhangyangyu zhangyangyu force-pushed the fix-avro-encoding branch 2 times, most recently from 012c834 to a591ccb Compare February 27, 2022 14:51
@ti-chi-bot ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 27, 2022
@zhangyangyu
Copy link
Member Author

/unhold

@ti-chi-bot ti-chi-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 27, 2022
@zhangyangyu
Copy link
Member Author

/run-all-tests

@@ -253,7 +254,8 @@ type RowChangedEvent struct {

RowID int64 `json:"row-id" msg:"-"` // Deprecated. It is empty when the RowID comes from clustered index table.

Table *TableName `json:"table" msg:"table"`
Table *TableName `json:"table" msg:"table"`
ColInfos []rowcodec.ColInfo `json:"column-infos" msg:"-"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use column_infos.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I don't really get it. Do you mean change the json:"column-infos" to json:"column_infos"? But all other fields use hyphens instead of underscores.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

msg:"-" seems to prevent the information from being persisted for Redo. Could you verify the compatibility of this change with Redo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redo only applies to mysql sink currently so does none of avro business. So prevent ColInfos from being persisted for Redo seems right.

But I think the ColInfos is important and maybe in future Redo could support avro? Persist ColInfos seems bring no trouble. Upgrade is also safe since a nil ColInfos field is safe for other codecs. But as @liuzix tells me, it will make the redo log large. So I'd like to leave it unpersist in this pull request.

Could you take a look @ben1009 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm for redo part

Copy link
Member

@overvenus overvenus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

cdc/sink/codec/avro.go Outdated Show resolved Hide resolved
@overvenus overvenus added component/sink Sink component. type/bugfix This PR fixes a bug. labels Feb 28, 2022
@Rustin170506 Rustin170506 requested review from Rustin170506 and removed request for disksing February 28, 2022 10:00
Copy link
Member

@Rustin170506 Rustin170506 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to cp it to 5.4, it looks like you only marked 5.3?

Please write a more detailed description of your PRs, how they work and why they were changed. Thanks!

Also, please add the release note.

@zhangyangyu
Copy link
Member Author

/run-all-tests

@zhangyangyu
Copy link
Member Author

zhangyangyu commented Feb 28, 2022

  • Do we need to cp it to 5.4, it looks like you only marked 5.3?
  • Please write a more detailed description of your PRs, how they work and why they were changed. Thanks!
  • Also, please add the release note.

All the three issues are marked severity/moderate. Usually we don't cherry-pick such PRs. Actually the behaviour affects all versions from v4.0. I will consult the PM.

@zhangyangyu
Copy link
Member Author

/run-all-tests

@zhangyangyu
Copy link
Member Author

/run-kafka-integration-test

Copy link
Member

@Rustin170506 Rustin170506 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

for _, item := range testColumnsTable {
obtainedMySQLType := getMySQLType(item.column)
c.Assert(obtainedMySQLType, check.Equals, item.expectedMySQLType)

obtainedJavaSQLType, err := getJavaSQLType(item.column, obtainedMySQLType)
c.Assert(err, check.IsNil)
c.Assert(obtainedJavaSQLType, check.Equals, item.expectedJavaSQLType)

if !item.column.Flag.IsBinary() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only for binary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For non-binary. Because the expectedValue is dummy for binary: https://github.com/pingcap/tiflow/blob/master/cdc/sink/codec/canal_test.go#L424.

}, time.Local)
}

colInfos := []rowcodec.ColInfo{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can organize this data by structured arrays? Controlling correspondence by index may be difficult with large test data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Actually using arrays restricts the usage pattern. We have to iterate the array(currently no problem). I tried to use the tp map, but it can't be used since the column Id is lost in Column struct. Add fields to Column will cause much code churn. Any suggestions?

@zhangyangyu
Copy link
Member Author

/run-all-tests

@zhangyangyu
Copy link
Member Author

/run-verify

@overvenus overvenus requested a review from maxshuang March 2, 2022 09:47
@zhangyangyu
Copy link
Member Author

It's confirmed that we could just follow our conventional workflow and don't need to cherry-pick.

@zhangyangyu
Copy link
Member Author

/remove-label needs-cherry-pick-release-5.3

@ti-chi-bot ti-chi-bot removed the needs-cherry-pick-release-5.3 Should cherry pick this PR to release-5.3 branch. label Mar 2, 2022
@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Mar 7, 2022
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Mar 7, 2022
@overvenus
Copy link
Member

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: b9b05cb

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Mar 7, 2022
@ti-chi-bot ti-chi-bot merged commit 54328e4 into pingcap:master Mar 7, 2022
@zhangyangyu zhangyangyu deleted the fix-avro-encoding branch March 8, 2022 02:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/sink Sink component. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/bugfix This PR fixes a bug.
Projects
None yet
8 participants