Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service Discovery bug on insufficient endorsements #268

Closed
griffithsrac opened this issue Mar 1, 2023 · 2 comments
Closed

Service Discovery bug on insufficient endorsements #268

griffithsrac opened this issue Mar 1, 2023 · 2 comments

Comments

@griffithsrac
Copy link

I think there's a couple of bugs in the fabric java sdk service discovery code during the sendTransactionProposalToEndorsers loop if it collects insufficient endorsements and has to retry. Specifically, it can rule out layouts and group where there have been valid endorsements as well as those groups that are actually no longer satisfiable.

For example if there were a group with 3 endorsers, of which 2 are required, we received 1 "good" endorsement and 1 "bad" endorsement:

  • what we'd expect to happen:
    • the SDLayout.SDGroup.endorsed counter increases by 1.
    • both endorsers are removed from the group (so as to not be re-selected).
    • the group is still flagged as able to be satisfied.
  • what actually happens:
    • the endorsed counter stays at 0
      • bug: endorsed = Math.min(required, endorsed++) (the value changed at endorsed++ is never used, i.e. endorsed will always stay at 0 irrespective of required or how many times this line is run)
    • both endorsers are removed as expected.
    • the group is still flagged as not able to be satisfied
      • bug: endorsers.size() >= required; should be endorsers.size() + endorsed >= required; (as we've removed the endorsers already).
      • this also needs the earlier bug to be fixed as well in order to return the correct result.

Obviously, I might be wrong in this, especially as this is code which hasn't changed in a long time yet nobody else has reported this problem that I'm aware of, but I think this is legitimate! Let me know if you want me to attempt a pull request.

You can reproduce by stopping a peer after it has been discovered. If it happens to be selected in the sendTransactionProposalToEndorsers then you should end up in the //still don't have the needed endorsements. section where the bugs are in:

  • sdChaindcodeEndorsementCopy.endorsedList(loopGood) for the endorsed counter bug.
  • sdChaindcodeEndorsementCopy.ignoreListSDEndorser(loopBad) for the comparator bug.

Obviously it depends on your network topology as to whether this will affect normal usage. I noticed it when running 3 orgs, each with 1 peer, each org's policy requires their peer to endorse, we require a strict majority of org policies, and then simulating a peer failure. I expected txs to still go through as sendTransactionProposalToEndorsers should end up selecting the layout not containing the bad peer eventually, but in reality only 1/3 of the txs work i.e. when it happens to pick the only remaining valid layout first time.

@bestbeforetoday
Copy link
Member

endorsed = Math.min(required, endorsed++) certainly looks like an obvious bug. I guess it should be endorsed = Math.min(required, endorsed + 1).

The endorsement logic is complicated enough (to me) for it to be difficult to be 100% certain of side-effects, but it does look like you are correct about endorsers.size() >= required too. Maybe it should be endorsers.size() >= getStillRequired().

My guess is that this scenario isn't typically hit as there are enough active peers or groups that endorsement requirements can be met on the first pass. I am actually a little dubious of the value of the SDK making some arbitrary number of retry attempts at endorsement. One pass through all the potential endorsement plans / layouts seems like it should generally be sufficient, with it then being a client application decision whether the entire endorsement process should be retried, or some other action should be taken.

Since the code doesn't look to be working the way it was intended, you are welcome to submit a pull request with a code fix. Ideally it would have some testing to confirm correct behaviour but I appreciate that the code structure does not lend itself at all to unit testing, so this may not be practical - at least not without some fairly significant refactoring, which carries its own risks without the safety net of good test coverage.

I would encourage everyone to switch to the Fabric Gateway client API as soon as you are able to get to Fabric v2.4+.

@griffithsrac
Copy link
Author

Okay, I've attempted to create a pull request 👍 Let me know if I've done anything wrong with the pull request, coming from gitlab everything is similar but slightly different!

Re the test: I've added a very minor unit test for the methods in which the bugs are present, but it's not truely testing the Channel.sendTransactionProposalToEndorsers because like you said it would be far trickier and possibly require more extensive changes.

Re Fabric Gateway: yes, I believe it is scheduled on our plan somewhere! 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants