Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kafka consumer missing data cause data inconsistency when the brokers failed frequently #9241

Closed
fubinzh opened this issue Jun 15, 2023 · 11 comments

Comments

@fubinzh
Copy link

fubinzh commented Jun 15, 2023

What did you do?

  1. create kafka changefeed
/cdc  cli  changefeed  create \"--server=127.0.0.1:8301\" \"--sink-uri=kafka://downstream-kafka.cdc-testbed
-tps-1807799-1-20:9092/broker_fail?max-message-bytes=1048576&replication-factor=2&enable-tidb-extension=true&protocol=canal-json\" \"--changefeed-id=kafka-random-broker-fail-canal-json\"
  1. run workload and inject kafka broker failure (case: kafka_random_broker_fail).
    "sysbench --db-driver=mysql --mysql-host=nslookup upstream-tidb.cdc-testbed-tps-1807799-1-20 | awk -F: '{pr int $2}' | awk 'NR==5' | sed s/[[:space:]]//g --mysql-port=4000 --mysql-user=root --mysql-db=workload --tables=100 --table-size=100000 --create_secondary=off --time=1800 --debug=true --threads=100 --mysql-ignore-errors=2013,1213,1105,1205,8022,8027,8028,9004,9007,1062 oltp_update_non_index run"
  2. send finishmark
  3. run kafka consumer to consumer kafka log to downstream mysql
  4. do data consistency check when finishmark is sent to downstream mysql

What did you expect to see?

data should be consistent

What did you see instead?

data inconsistecy seen, 3 out of 100 tables are inconsistent.

image

image

Versions of the cluster

cdc version

Release Version: v7.2.0-alpha
Git Commit Hash: 98bd965630ff2f6efd105810a790a2c5e8e389e4
Git Branch: heads/refs/tags/v7.2.0-alpha
UTC Build Time: 2023-06-13 11:32:27
Go Version: go version go1.20.3 linux/amd64
Failpoint Build: false
@fubinzh fubinzh added area/ticdc Issues or PRs related to TiCDC. type/bug The issue is confirmed as a bug. labels Jun 15, 2023
@fubinzh
Copy link
Author

fubinzh commented Jun 15, 2023

Kafka consumer log:
cdc_kafka_consumer.log

@fubinzh fubinzh changed the title data inconsistency data inconsistency when injecting kafka brokerfail Jun 15, 2023
@fubinzh fubinzh changed the title data inconsistency when injecting kafka brokerfail data inconsistency when injecting kafka broker fail Jun 16, 2023
@fubinzh
Copy link
Author

fubinzh commented Jun 16, 2023

case kafka_controller_fail also fails due to consistency check failure. https://tcms.pingcap.net/dashboard/executions/plan/1808271

@fubinzh fubinzh closed this as completed Jun 16, 2023
@fubinzh fubinzh reopened this Jun 16, 2023
@fubinzh
Copy link
Author

fubinzh commented Jun 16, 2023

/severity critical

@fubinzh
Copy link
Author

fubinzh commented Jun 16, 2023

/found automation

@ti-chi-bot ti-chi-bot bot added the found/automation Bugs found by automation cases label Jun 16, 2023
@3AceShowHand
Copy link
Contributor

[2023/06/14 03:39:17.771 +00:00] [PANIC] [main.go:346] ["Error from consumer: %v"] [error="dial tcp 10.233.78.133:9092: connect: connection refused"] [stack="main.main.func1\n\tgithub.com/pingcap/tiflow/cmd/kafka-consumer/main.go:346"]

@3AceShowHand
Copy link
Contributor

After investigation, the data loss comes from the kafka consumer part, not the cdc.

@3AceShowHand 3AceShowHand self-assigned this Jun 20, 2023
@fubinzh
Copy link
Author

fubinzh commented Jun 20, 2023

/remove-severity critical

@fubinzh
Copy link
Author

fubinzh commented Jun 20, 2023

/severity major

@3AceShowHand 3AceShowHand changed the title data inconsistency when injecting kafka broker fail kafka consumer missing data cause data inconsistency when the brokers failed frequently Jun 20, 2023
@3AceShowHand
Copy link
Contributor

/severity moderate

@nongfushanquan
Copy link
Contributor

#9694
all other issues of the consumer will link to the above issue
/close

@ti-chi-bot ti-chi-bot bot closed this as completed Sep 21, 2023
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Sep 21, 2023

@nongfushanquan: Closing this issue.

In response to this:

#9694
all other issues of the consumer will link to the above issue
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

4 participants