-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data inconsistency in etcd version 3.3.11 #13503
Comments
Please try |
Hiii @ahrtr , |
Note that any key that was created using the v2 API will not be able to be queried via the v3 API. A v3 API etcdctl get of a v2 key will exit with 0 and no key data, this is the expected behaviour. By default, etcdctl on master (3.4) uses the v3 API and earlier versions (3.3 and earlier) default to the v2 API. |
Hi @ahrtr , bash-4.4$ etcdctl put /test thisistestvalue we are not using ETCDCTL_API=3 before commands, still issue is seen. |
Please double check whether you set he environment variable |
bash-4.4$ ETCDCTL_API=2 etcdctl get /test error #0: unsupported protocol scheme "microservice name" This error comes while working with ETCDCTL_API=2. |
Hii @ahrtr , |
Hi @ahrtr , Thanks |
Hiii, |
Sorry, I have no knowledge on your micro-service, so I can't provide any comments here. I suggest you to raise an issue to the engineering team of the micro-service. |
Hi @ahrtr , Thanks, |
hi @ahrtr , |
You need to resolve this issue firstly. It looks like related to your micro-service. Please also make sure you are using the matched versions between etcd and etcdctl. Again, Please consult with the engineering/dev team of the micro-service. |
Hi @ahrtr , Thanks |
Please check the following info,
|
bash-4.4$ etcdctl version bash-4.4$ etcdctl member list -w table bash-4.4$ echo $ETCDCTL_ENDPOINTS bash-4.4$ etcdctl --endpoints=eric-data-distributed-coordinator-ed.cces-ci-ns:2379 endpoint status -w table 2022-01-07 09:47:46.580220 C | pkg/flags: conflicting environment variable "ETCDCTL_ENDPOINTS" is shadowed by corresponding command-line flag (either unset environment variable or disable flag) bash-4.4$ ETCDCTL_API=3 etcdctl get /test --prefix=true --write-out json |
Hi @ahrtr , Thanks, |
You do not input the correct endpoints when executing Please format the output in |
Hi @ahrtr , Thanks, |
Hi @ahrtr , Thanks, |
Hi @ahrtr Thanks |
Please execute the following commands and provide feedbacks (AGAIN PLEASE FORMAT ALL YOUR INPUTs IN CODE).
Please also provide the complete logs of your etcd instances. |
Hi @ahrtr , bash-4.4$ ETCDCTL_API=3 etcdctl --endpoints $ENDPOINTS member list -w table bash-4.4$ ETCDCTL_API=3 etcdctl --endpoints $ENDPOINTS endpoint status -w table I am getting this output for last 2 commands. |
Please check the environment variables, you either configure the |
Hi @ahrtr , bash-4.4$ echo $ETCDCTL_ENDPOINTS still getting same error when tried with ETCDCTL_ENDPOINTS environment variable. bash-4.4$ ETCDCTL_API=3 etcdctl --endpoints $ETCDCTL_ENDPOINTS member list -w table still getting same error when tried with steps in your comment(exporting endpoints, then running command.). bash-4.4$ export ENDPOINTS=https://eric-data-distributed-coordinator-ed-0.eric-data-distributed-coordinator-ed.cces-ci-ns:2379,https://eric-data-distributed-coordinator-ed-1.eric-data-distributed-coordinator-ed.cces-ci-ns:2379,https://eric-data-distributed-coordinator-ed-2.eric-data-distributed-coordinator-ed.cces-ci-ns:2379 Thanks, |
How many members are in your etcd cluster? Why there is only one endpoint in the environment variable Note that you have two choices to run etcdctl:
I suggest you to read through the official guide to get more understandings on etcd. |
Hi @ahrtr , Thanks |
You can remove the member, which is out of sync, from the cluster, and remove the local data of the member as well. Afterwards, join the member to the cluster again, then etcd will sync data automatically. Another solution is to follow the https://etcd.io/docs/v3.5/op-guide/recovery/ . But it may cause some downtime on the application. Please backup all data before any actions! |
statefulset.txt Thanks. |
@ahrtr any updates |
We want to understand the root cause of this problem where data is being inconsistent and revisions are different across pods, we're suspecting this could the reason: We found multiple tickets for the same issue: Hope this might help: |
I am a little busy this week, will have a deep dive sometime next week on this. |
Hi |
Hi |
Most likely you are running into the same issue as 11651. You can double check the values of I just submitted a PR pull/13834 to enhance the print format. The issue 11651 has already been fixed in 3.3.21, 3.4.8 and 3.5.0, so you should be good because you are already on 3.4.16. But we still see data inconsistency issue raised on 3.5.1, and the root cause isn't clear yet. etcd is an open source project, everyone, including you, feel free to dig into whatever issue they are interested in. Note that nobody get paid, any issues raised by your customers should be escalated to the management team in your company instead of the community. Of course, any issue is welcome to be raised in the community; but it depends on all contributors, including you, to resolve all issues. |
Hi @ahrtr , [{"Endpoint":"https://localhost:2379","Status":{"header":{"cluster_id":2104678578865624298,"member_id":4962605495301377754,"revision":1306,"raft_term":18},"version":"3.4.16","dbSize":1597440,"leader":12637332122132188569,"raftIndex":1446,"raftTerm":18,"raftAppliedIndex":1446,"dbSizeInUse":1413120}}] and following log lines states that - etcd3.4.16 new added log also proved the apply requet has less auth-revision current node auth-revision, caused the data is not applied on the disk even raft if ok. Could you confirm that issue is still there in 3.4.16 version? Thanks, |
Hi @ahrtr , |
Thanks @rahulbapumore for the info. Have you performed the the workaround previously after upgrading to 3.4.16? Please note that I am asking you to perform the workaround now! The reason why I ask this is I need to understand whether the issue was carried over from old version to 3.4.16 or the issue is still reproducible on 3.4.16. |
Hi @ahrtr , Thanks, |
Hi @ahrtr |
I started 2000 threads (using Jmeter) which concurrently send requests to an etcd cluster (3.4.16) with 3 members, and in the meanwhile occasionally kill the members, but couldn't reproduce this issue. I do not see any issue by checking the source code of release-3.4. Note that in your previous comment , all members have the same endpoint |
Hi @ahrtr , Thanks |
Hi @ahrtr ,
|
@rahulbapumore Can you please format the pasted log using code? It's hard to read without proper formatting. Please read quoting-code |
done :P |
Thanks @serathius . It seems that the etcd is too fragile once auth is enabled. Just raised another issue 13937 |
Hi @ahrtr , Thanks, |
This issue 13937 is only specific to 3.5 and main. I can not reproduce the issue you pointed out in previous comment so far. It would be helpful if you can reproduce the issue ( |
Hi @ahrtr , Thanks |
Hi @ahrtr , In the above log lines , see bolded auth_revision count, from above lines we can say that 3 lines shows auth_revision=11 Thanks |
Hi @ahrtr , Thanks |
Hi @ahrtr |
Just raised a new issue to follow up 13976. |
etcdctl get command returns values sometimes and sometimes it does not return a value even if key value is present in etcd. You can see following command output executed immediately one by one.
bash-4.4$ etcdctl put /test thisistestvalue
OK
bash-4.4$ etcdctl get /test
bash-4.4$
bash-4.4$ etcdctl get /test
bash-4.4$ etcdctl get /test
/test
thisistestvalue
bash-4.4$ etcdctl get /test
/test
thisistestvalue
From below command, we can see that the inconsistence happens. We can see each time we query using etcdctl get and create_revision is different sometimes giving different values.
bash-4.4$ ETCDCTL_API=3 etcdctl get /test --write-out json --consistency="s"
{"header":{"cluster_id":10661059405016682411,"member_id":7511149175418186860,"revision":36793,"raft_term":16}}
bash-4.4$ ETCDCTL_API=3 etcdctl get /test --write-out json --consistency="s"
{"header":{"cluster_id":10661059405016682411,"member_id":14491470182485552592,"revision":10495,"raft_term":16}
,"kvs":[{"key":"L3Rlc3Q=","create_revision":6830,"mod_revision":6830,"version":1,"value":"dGVzdHZhbHVl"}],"count":1}
bash-4.4$ ETCDCTL_API=3 etcdctl get /test --write-out json --consistency="s"
{"header":{"cluster_id":10661059405016682411,"member_id":12240595110633392601,"revision":36802,"raft_term":16}}
bash-4.4$
bash-4.4$ ETCDCTL_API=3 etcdctl get /test1 --prefix=true --write-out json
{"header":{"cluster_id":10661059405016682411,"member_id":12240595110633392601,"revision":36818,"raft_term":16}
,"kvs":[{"key":"L2VyaWMtY2Nlcy1leHRlbnNpb24tbWFuYWdlci90ZXN0","create_revision":33064,"mod_revision":33064,"version":1,"value":"dmFsdWV0ZXN0"}],"count":1}
bash-4.4$
bash-4.4$ ETCDCTL_API=3 etcdctl get /test1 --prefix=true --write-out json
{"header":{"cluster_id":10661059405016682411,"member_id":14491470182485552592,"revision":10511,"raft_term":16}
,"kvs":[{"key":"L2VyaWMtY2Nlcy1leHRlbnNpb24tbWFuYWdlci90ZXN0","create_revision":3641,"mod_revision":3641,"version":1,"value":"bXl0ZXN0dmFsdWU="}],"count":1}
bash-4.4$ ETCDCTL_API=3 etcdctl get /test1 --prefix=true --write-out json
{"header":{"cluster_id":10661059405016682411,"member_id":7511149175418186860,"revision":36819,"raft_term":16}
,"kvs":[{"key":"L2VyaWMtY2Nlcy1leHRlbnNpb24tbWFuYWdlci90ZXN0","create_revision":33064,"mod_revision":33064,"version":1,"value":"dmFsdWV0ZXN0"}],"count":1}
Check the operation test as below: After performing Delete operation also we are able to get value for the deleted key.
bash-4.4$ etcdctl put /temp/test mytestvalue
OK
bash-4.4$ etcdctl get /temp/test
/temp/test
mytestvalue
bash-4.4$
bash-4.4$ etcdctl del /temp/test
1
bash-4.4$ etcdctl get /temp/test
/temp/test
mytestvalue
bash-4.4$ etcdctl get /temp/test
bash-4.4$ etcdctl get /temp/test
bash-4.4$ etcdctl get /temp/test
/temp/test
mytestvalue
bash-4.4$ etcdctl get /temp/test
bash-4.4$ etcdctl get /temp/test
/temp/test
mytestvalue
bash-4.4$ etcdctl get /temp/test
bash-4.4$ etcdctl get /temp/test
bash-4.4$ etcdctl get /temp/test
/temp/test
mytestvalue
bash-4.4$ etcdctl del /temp/test
0
bash-4.4$ etcdctl get /temp/test
/temp/test
mytestvalue
bash-4.4$ etcdctl del /temp/test
0
bash-4.4$ etcdctl get /temp/test
bash-4.4$ etcdctl get /temp/test
bash-4.4$ etcdctl get /temp/test
bash-4.4$ etcdctl get /temp/test
bash-4.4$ etcdctl get /temp/test
bash-4.4$ etcdctl get /temp/test
/temp/test
mytestvalue
These kind of data inconsistency is seen in etcd . ETCD guarantees Data consistency. Could you please help understanding the issue here.? whats happening exactly?
The text was updated successfully, but these errors were encountered: