-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Service Discovery bug on insufficient endorsements #268
Comments
The endorsement logic is complicated enough (to me) for it to be difficult to be 100% certain of side-effects, but it does look like you are correct about My guess is that this scenario isn't typically hit as there are enough active peers or groups that endorsement requirements can be met on the first pass. I am actually a little dubious of the value of the SDK making some arbitrary number of retry attempts at endorsement. One pass through all the potential endorsement plans / layouts seems like it should generally be sufficient, with it then being a client application decision whether the entire endorsement process should be retried, or some other action should be taken. Since the code doesn't look to be working the way it was intended, you are welcome to submit a pull request with a code fix. Ideally it would have some testing to confirm correct behaviour but I appreciate that the code structure does not lend itself at all to unit testing, so this may not be practical - at least not without some fairly significant refactoring, which carries its own risks without the safety net of good test coverage. I would encourage everyone to switch to the Fabric Gateway client API as soon as you are able to get to Fabric v2.4+. |
Okay, I've attempted to create a pull request 👍 Let me know if I've done anything wrong with the pull request, coming from gitlab everything is similar but slightly different! Re the test: I've added a very minor unit test for the methods in which the bugs are present, but it's not truely testing the Re Fabric Gateway: yes, I believe it is scheduled on our plan somewhere! 🤞 |
I think there's a couple of bugs in the fabric java sdk service discovery code during the
sendTransactionProposalToEndorsers
loop if it collects insufficient endorsements and has to retry. Specifically, it can rule out layouts and group where there have been valid endorsements as well as those groups that are actually no longer satisfiable.For example if there were a group with 3 endorsers, of which 2 are required, we received 1 "good" endorsement and 1 "bad" endorsement:
SDLayout.SDGroup.endorsed
counter increases by 1.endorsed
counter stays at 0endorsed = Math.min(required, endorsed++)
(the value changed atendorsed++
is never used, i.e.endorsed
will always stay at 0 irrespective ofrequired
or how many times this line is run)endorsers.size() >= required;
should beendorsers.size() + endorsed >= required;
(as we've removed the endorsers already).Obviously, I might be wrong in this, especially as this is code which hasn't changed in a long time yet nobody else has reported this problem that I'm aware of, but I think this is legitimate! Let me know if you want me to attempt a pull request.
You can reproduce by stopping a peer after it has been discovered. If it happens to be selected in the
sendTransactionProposalToEndorsers
then you should end up in the//still don't have the needed endorsements.
section where the bugs are in:sdChaindcodeEndorsementCopy.endorsedList(loopGood)
for the endorsed counter bug.sdChaindcodeEndorsementCopy.ignoreListSDEndorser(loopBad)
for the comparator bug.Obviously it depends on your network topology as to whether this will affect normal usage. I noticed it when running 3 orgs, each with 1 peer, each org's policy requires their peer to endorse, we require a strict majority of org policies, and then simulating a peer failure. I expected txs to still go through as
sendTransactionProposalToEndorsers
should end up selecting the layout not containing the bad peer eventually, but in reality only 1/3 of the txs work i.e. when it happens to pick the only remaining valid layout first time.The text was updated successfully, but these errors were encountered: