Skip to content

add 006-bft-based-ordering-service.md#33

Closed
pfi79 wants to merge 1 commit into
hyperledger:mainfrom
pfi79:master
Closed

add 006-bft-based-ordering-service.md#33
pfi79 wants to merge 1 commit into
hyperledger:mainfrom
pfi79:master

Conversation

@pfi79
Copy link
Copy Markdown

@pfi79 pfi79 commented Sep 17, 2020

No description provided.

Signed-off-by: Фёдор Партанский <partanskiy.f@n-t.io>
@jyellick
Copy link
Copy Markdown

The biggest inhibitor to pulling any sort of new code into Fabric is the maintenance overhead. It's quite notable that the Fabric fork utilizing this BFT library is still based on the v1.4.x and has not kept up with Fabric development (now at v2.2.x+). In particular, the component pieces unrelated to the actual consensus plugin -- performing BFT resilient block pulling in the peer, and BFT replication of blocks in the orderer cluster component -- would be very non-trivial to forward port at this point because these code paths have seen substantial changes.

Who is the technical sponsor who plans to forward port these changes, and maintain them, as well as the consensus plugin?

@davidkhala
Copy link
Copy Markdown
Member

Good to know a new BFT orderer implementation.
I think rebase work to latest LTS can be completed in anyway. But are we ready to provide 2 options both for BFT orderer type?

@OlegMartian
Copy link
Copy Markdown

Thank you for your comments. Autonomous Noncommercial Organization “ID&AS: Inter-Disciplinary & Advanced Studies Center” or “IDEAS Center” is the Sponsor of SmartBFT development. IDEAS is managing Smart BFT additional development at the moment and will be responsible for the following support and new Fabric versions compatibility.

@OlegMartian
Copy link
Copy Markdown

IDEAS is going to release the forwarding port for v2.2.0 and our team will be happy to collaborate, contribute, and receive recommendations.

@jyellick
Copy link
Copy Markdown

jyellick commented Oct 5, 2020

@OlegMartian it's great to hear about additional development interest from your team, we are of course always happy to have more contributors. Typically, RFCs are shepherded by an active maintainer who has demonstrated a history of quality contributions and long term commitment to the project. Perhaps it would make the most sense to begin by addressing some of the technical debt around block replication before attempting to tackle such a large new feature. For instance, creating a unified BFT-ready block replicator for use both in the orderer cluster block replication and standard peer replication would be a good place to start and be a necessary step towards any BFT ordering implementation.

@OlegMartian
Copy link
Copy Markdown

@jyellick thank you for the recommendation. You have raised a very valid concern, there is definitely a need to implement BFT resistant peer side block replicator, and as pointed out in this RFC the intra cluster communication provided for Raft exhibits such property. Moreover, we believe this work is indistinguishable and has to be completed along the way with the implementation of BFT based ordering service. We do agree that such functionality has to be implemented and provided as self-contained piece work in the context of this RFC.

Currently, we are looking to hear more feedback and input from HLF community, especially Hyperledger Fabric Maintainers to formulate a reasonable integration strategy and to identify a roadmap and milestones to generate small self containable deliverables and provide ongoing support.

@jyellick
Copy link
Copy Markdown

@OlegMartian Perhaps at this point it would be helpful for the Fabric developers who will be performing the integration work to join the discussion. It is always exciting to bring new function to a project, but it's important that we do so in a maintainable way. The RFC, as written, is essentially an explanation of the approach taken in the cited Fabric forks. Since the Fabric code base has already diverged significantly since those forks, there remain significant open questions of what things would look like on top of master. Who are the technical contributors who are interested in discussing how to design and implement this new work?

@OlegMartian
Copy link
Copy Markdown

@jyellick thanks for your point, I think it’s a good suggestion, let us add more details into RFC to facilitate the technical discussion exactly as you mentioned.

@denyeart
Copy link
Copy Markdown
Contributor

denyeart commented Nov 3, 2020

This is an impressive piece of work. I would however echo some of Jason's sentiments before taking the next steps:

  1. Before we can add additional ordering service options, we need to address tech debt in the existing ordering service foundation. Jason mentioned block replication already. I'll add one more - the ordering service common code and Raft integration have a large number of test flakes, to the degree that it hinders getting even non-orderer PRs from passing CI and getting merged into Fabric.
  2. It appears that the contribution and eventual support is coming from developers that have little engagement with the Fabric project, whether measured by commits, PR reviews, mailing list activity, or contributor meeting attendance. The barriers are fairly low for such engagement, and the lack of engagement naturally raises concern both on the ability to successfully integrate such a major feature, and the ability to support it long term. The tech debt items mentioned above would be a great way to build the experience required to successfully integrate a major new feature.
  3. We would expect a large feature like this to have a maintainer sponsor that could help with the design, staging, integration, reviews, tests, etc. If no existing maintainer decides to sponsor the work, there is always the option of engaging deeply with the project and eventually becoming a maintainer yourself, based on a track record of high quality commits, reviews, and support.
  4. Given the amount of effort to integrate and support a BFT implementation into mainline Fabric, I would like to see a focus on a single initial BFT implementation, rather than spread developer resources and support across a SmartBFT implementation and a Mir BFT implementation. Personally, I would prefer to see investment in the Mir BFT implementation based on its exceptional performance characteristics in a wide area network, but would consider either approach if the developers working on each initiative could create a comparative analysis and a proposed roadmap, at least for any common foundation code.

@jyellick
Copy link
Copy Markdown

jyellick commented Nov 5, 2020

I'd like to propose that we either open a new RFC(s), or rework this RFC to address the underlying technical debt and common BFT related features necessary for any BFT ordering implementation. Allow me to elaborate.

The orderer has always been designed to accommodate different consensus plugins. Fabric defines a 'Consensus API' in the github.com/hyperledger/fabric/orderer/consensus package which any plugin can implement, to allow for easy switching of consensus type. Originally, we simply had Solo, and Kafka, and the interface was designed to accommodate these two plugins. When Raft was introduced, this API needed to be extended, and, some of the more consensus plugin specific aspects of Fabric, such as enumerated Kafka configuration in the channel config and orderer config were made more generic, with a generic ConsensusMetadata field on consensus type in the channel config, and free-form Consensus section in the orderer.yaml. This has taken us a few steps closer to making Fabric ordering truly consensus agnostic.

However, we still have the problem that consensus plugins must reside in tree, there is no way to dynamically inject them. This problem is both simple to fix, and carries with it a large number of complications. Most obviously, we could provide an exported function in the Fabric tree which allows an external consumer to build a Fabric compatible orderer binary leveraging the rest of the Fabric codebase as a library. This is I think the right way for us to enable consensus plugins, but, as I said, there is also some baggage. Fabric is not structured as a library, and because there is such a large exported API surface, changes in parts of the code (like channel config interfaces) which seem benign, could break downstream consumers (in this case consensus plugins).

Additionally, there is the common functionality, such as BFT block replication which simply does not exist within the orderer codebase which will need to be implemented, this would be across peers and orderers.

So, I would suggest we proceed as follows.

  1. Put together a proposal for allowing consensus plugins to consume Fabric ordering as a library, to build Fabric compatible orderers. After agreeing on an approach, and discussing some of the pitfalls, implement it. It would not need to be perfect at first pass, but keeping in mind that the less debt that is addressed, and the less tight the API, the more pain downstream consumers will feel. Since the SmartBFT consensus plugin would be the first consumer, it's a natural pairing for the developers maintaining SmartBFT to be the same ones developing and maintaining this external API. Looking at the RFC as proposed here, I think this makes good sense, as the RFC explicitly notes the relatively small amounts of code required to glue the library to the consensus plugin API, so maintaining the orderer API consumer out of tree should not be overly burdensome.
  2. Put together a proposal for the common BFT features that will need to be present in both the peers and orderers, and which necessarily cannot reside inside the consensus plugin. This would include the block replication piece already mentioned, but might spill over into other pieces. As the owners of the SmartBFT Fabric fork, hopefully you can bring some clarity as to what other non-plugin aspects are necessary.
  3. Have the SmartBFT consensus implementation for Fabric be our first consumer of this new externally consumable consensus API.

@OlegMartian
Copy link
Copy Markdown

Dear @denyeart , thank you for the proposed steps. Can you help with some information?

  1. Before we can add additional ordering service options, we need to address tech debt in the existing ordering service foundation. Jason mentioned block replication already. I'll add one more - the ordering service common code and Raft integration have a large number of test flakes, to the degree that it hinders getting even non-orderer PRs from passing CI and getting merged into Fabric.

We understand that HLF needs some bug fixing in the ordering service common code, Raft Integration, and other causes of test flakes. Our team is ready to collaborate and gets our hands wet working on the HLF tech debt items. Block replication thought, might special treatment. As I mentioned above we believe this work is indistinguishable and has to be completed along the way with the implementation of BFT based ordering service. Can we have a look at the list of technical debt items to estimate what we are able to contribute? 

It appears that the contribution and eventual support is coming from developers that have little engagement with the Fabric project, whether measured by commits, PR reviews, mailing list activity, or contributor meeting attendance. The barriers are fairly low for such engagement, and the lack of engagement naturally raises concern both on the ability to successfully integrate such a major feature, and the ability to support it long term. The tech debt items mentioned above would be a great way to build the experience required to successfully integrate a major new feature.

I agree that engagement with HLF Project makes significant sense as for Smart BFT integration so for proving the ability to develop and support the solution. Since we started communication with the community very recently it is fair that we have not done a lot yet. Our representative will be attending contributor meetings very soon. We would like to show a presentation of the solution in one of the maintainer's meeting. Could you assist to arrange this activity, please?

We would expect a large feature like this to have a maintainer sponsor that could help with the design, staging, integration, reviews, tests, etc. If no existing maintainer decides to sponsor the work, there is always the option of engaging deeply with the project and eventually becoming a maintainer yourself, based on a track record of high-quality commits, reviews, and support.

We have got a suggestion from Linux Foundation representatives to work with Artem Barger ( @C0rWin ) who can help us going forward. We are going to be in touch with him to check on the next steps.

Given the amount of effort to integrate and support a BFT implementation into mainline Fabric, I would like to see a focus on a single initial BFT implementation, rather than spread developer resources and support across a SmartBFT implementation and a Mir BFT implementation. Personally, I would prefer to see investment in the Mir BFT implementation based on its exceptional performance characteristics in a wide area network but would consider either approach if the developers working on each initiative could create a comparative analysis and a proposed roadmap, at least for any common foundation code.

We believe that there are different types of needs from projects. Some of them might require WAN level scalability which can be met by Mir BFT in the future. Whereas others could be resolved with SmartBFT since now. From exploring MirBFT a bit it seems that in order to support such a framework, it might require significant changes in the way Fabric Ordering service handles transactions, given that blocks are will be produced in parallel by multiple leaders. We were wondering whenever do you have an upgraded architecture design and Roadmap for Mir BFT integration into Fabric? Moreover, given the architecture of Fabric aims to support a pluggable consensus mechanism, I think having several BFT flavours will only benefit the project.

@OlegMartian
Copy link
Copy Markdown

Hi @jyellick . Your proposal is seems about the right direction, but it also feels rather ambitious and assumes complete refactoring of the Fabric code base, at least parts that relevant to Ordering Service. From the one side, this change sounds reasonable, but from the other side, it feels that it requires careful design and planning which I think is very good in a long term. There are a lot of question marks on how such pluggable or composable solution is expected to be implemented, for instance that implies that entire stack of supporting tools has to be reimplemented or introduced to support it, also it’s not very obvious how configuration or notion of BFT based ordering service should be propagated to the peers network to conduct block validation policy. Where implementation of block validation policy with such approach is also not very clear, at least not at the current moment.

From other side, current RFC suggest integration of already implemented and BFT library, which is highly Fabric centric, completely non intrusive and could be achieved quite easily in feasible future. I think that we would be glad to join the design of pluggable composable solution as you suggested, while executing it regardless to the current RFC. Can we have a call with you to align on the proposal itself and determine the following common steps?

@jyellick
Copy link
Copy Markdown

We believe that there are different types of needs from projects. Some of them might require WAN level scalability which can be met by Mir BFT in the future. Whereas others could be resolved with SmartBFT since now. From exploring MirBFT a bit it seems that in order to support such a framework, it might require significant changes in the way Fabric Ordering service handles transactions, given that blocks are will be produced in parallel by multiple leaders.

I don't want to derail the conversation on the topic at hand, but this isn't an accurate description of how Mir would likely be integrated. Most likely, Mir would simply be utilized as a BFT transaction stream, which could then be deterministically cut into blocks (very similar to how the original Kafka implementation worked). There is no Mir RFC at the moment, so it is probably not worth getting too bogged down in, but to say it would require significant changes to the way Fabric ordering handles transactions is likely not true.

We were wondering whenever do you have an upgraded architecture design and Roadmap for Mir BFT integration into Fabric?

As noted above, I actually don't think there's a ton of modifications to be made to Fabric to accommodate Mir specifically. There is obviously some consensus plugin work to be done, but the challenges land largely around the same topics SmartBFT encountered.

  • Getting blocks to peers in a BFT manner.
  • Having those peers validate the blocks in a similarly safe manner.
  • Doing the same, but with blocks between orderers for the purposes of state transfer.
  • Getting clients to broadcast transactions in a BFT manner.

The actual challenge of integrating a BFT consensus algorithm for ordering is fairly straightforward with those problems solved. With that said, these are all open questions, and there is currently no documentation, RFC, etc. which describes them.

Moreover, given the architecture of Fabric aims to support a pluggable consensus mechanism, I think having several BFT flavours will only benefit the project.

If there is a need and a demand for multiple BFT implementations, then by all means, let's do so. But, there is a maintenance cost and a cognitive cost associated with all things.

Your proposal is seems about the right direction, but it also feels rather ambitious and assumes complete refactoring of the Fabric code base, at least parts that relevant to Ordering Service

I'm not sure that "complete refactoring" is fair here, certainly there would need be some structural changes, but likely they would be largely related to control flow, requiring very little actual code to accomplish. But as @denyeart pointed out, there is a lot of debt in the Fabric code-base to be addressed, and this would be a great way for your team to start contributing.

From the one side, this change sounds reasonable, but from the other side, it feels that it requires careful design and planning which I think is very good in a long term. There are a lot of question marks on how such pluggable or composable solution is expected to be implemented, for instance that implies that entire stack of supporting tools has to be reimplemented or introduced to support it, also it’s not very obvious how configuration or notion of BFT based ordering service should be propagated to the peers network to conduct block validation policy. Where implementation of block validation policy with such approach is also not very clear, at least not at the current moment.

Absolutely, these are all excellent questions about how Fabric wants to expose BFT ordering to the other network components. My understanding of the existing SmartBFT implementation is that intimate knowledge of the consensus plugin's configuration is used to compute things like the block validation policy, and to decide what nodes to pull blocks from. My reaction to this is that it is a layering violation, and that this is the wrong approach. I understand that it simplifies usability, but it also goes contrary to the idea of an ordering service being a service whose internal workings can and should be treated opaquely. As Fabric has matured, we've seen consensus plugin details move from specific uniquely keyed properties in the channel configuration and orderer yamls towards a more generic notion of consensus data. For supporting tools like configtxgen, perhaps we need to extend this pattern. Or, maybe we have become overly generic at the cost of usability and the approach taken in SmartBFT is superior, or perhaps there is another third option which gets us the benefits of both. These are exactly the sorts of concerns and solutions I'd like to see articulated in an RFC and discussed there among the stakeholders.

From other side, current RFC suggest integration of already implemented and BFT library, which is highly Fabric centric, completely non intrusive and could be achieved quite easily in feasible future.

This doesn't make sense to me. Just above, you suggested that allowing the consensus plugin to consume Fabric to produce an orderer binary out of tree was not feasible because of how highly intrusive the consensus plugin was in other parts of the system (such as requiring modifications to configtxgen, to ignoring and overriding the block validation policy in the peer, etc.). So, to here claim that it is "completely non intrusive" doesn't seem consistent. Perhaps more importantly, the existing proposal suggests pulling in a large volume of code without the original authors nor anyone with a history of Fabric contributions committed to maintaining it. This is simply not the way good open source is done. The hope of allowing consensus plugins to consume Fabric externally would allow current users of the SmartBFT Fabric fork to move up to a modern version of Fabric, while decreasing the maintenance burden for the SmartBFT code. At the same time, we eliminate some outstanding technical debt and generally lower the barrier for new consensus implementations, BFT or otherwise.

I think that we would be glad to join the design of pluggable composable solution as you suggested, while executing it regardless to the current RFC. Can we have a call with you to align on the proposal itself and determine the following common steps?

By and large, as an open source community, we try to do our planning and collaboration via avenues like github and the mailing list, so that different timezones and schedules do not become a barrier for participation, and it provides a natural record to those who might come later looking to understand the reasons behind why different decisions were made. We do of course have the weekly community call, though typically this is reserved more for demonstrations and status than for technical discussions. As I've mentioned before, I think the most productive avenue would be to bring the technical resources online to enumerate the problems which need to be solved, and to begin proposing concrete solutions -- this is exactly what the RFC process is all about. If we find that the asynchronous nature of github is problematic, then perhaps we can set up a call, but if we can avoid it, I think would be preferable.

@OlegMartian
Copy link
Copy Markdown

OlegMartian commented Nov 24, 2020

I don't want to derail the conversation on the topic at hand, but this isn't an accurate description of how Mir would likely be integrated. Most likely, Mir would simply be utilized as a BFT transaction stream, which could then be deterministically cut into blocks (very similar to how the original Kafka implementation worked). There is no Mir RFC at the moment, so it is probably not worth getting too bogged down in, but to say it would require significant changes to the way Fabric ordering handles transactions is likely not true.

From our understanding from the reading MIR paper, there are some intrinsics of the protocol that are likely to require some modifications, for instance, given that blocks going to be produced in parallel that means hash chaining orchestrated orthogonal to batching, also that means there is should be some signing mechanism, of course, some of these might be encapsulated internally into Mir. There are additional concerns of how configuration updates will be supported and from our recollection of Mir, there is an assumption that clients' identities are known, which is not the case for the current implementation of ordering service nodes. But your right this might be too early as evil is in the details and we better wait for design and documentation.

If there is a need and a demand for multiple BFT implementations, then by all means, let's do so. But, there is a maintenance cost and a cognitive cost associated with all things.

There is definitely a need for having at least one implementation, do you have some estimates on Mir completion and Fabric integration design?

I'm not sure that "complete refactoring" is fair here, certainly there would need be some structural changes, but likely they would be largely related to control flow, requiring very little actual code to accomplish.

Well, I think it might be a bit early since it’s something yet to be designed, however, we do agree that some changes are needed. Whenever it’s some structural changes or a bit beyond that is subject to architecture and design which doesn’t exist yet.

But as @denyeart pointed out, there is a lot of debt in the Fabric code-base to be addressed, and this would be a great way for your team to start contributing.

As I noted in our reply to @denyeart, we would be glad to take part and tackle debt in Fabric code base, while not sure why this is a prerequisite of acceptance for suggested RFC.

Absolutely, these are all excellent questions about how Fabric wants to expose BFT ordering to the other network components. My understanding of the existing SmartBFT implementation is that intimate knowledge of the consensus plugin's configuration is used to compute things like the block validation policy, and to decide what nodes to pull blocks from. My reaction to this is that it is a layering violation, and that this is the wrong approach. I understand that it simplifies usability, but it also goes contrary to the idea of an ordering service being a service whose internal workings can and should be treated opaquely. As Fabric has matured, we've seen consensus plugin details move from specific uniquely keyed properties in the channel configuration and orderer yamls towards a more generic notion of consensus data. For supporting tools like configtxgen, perhaps we need to extend this pattern. Or, maybe we have become overly generic at the cost of usability and the approach taken in SmartBFT is superior, or perhaps there is another third option which gets us the benefits of both. These are exactly the sorts of concerns and solutions I'd like to see articulated in an RFC and discussed there among the stakeholders.

These are all good and very valid points, as finally brings us to the technical discussion plane. We have detailed all architectural and design aspects in RFC itself, could you please provide specific comments with respect to concerns you have raised? In particular, it’s very interesting to hear your thoughts about what should be the proper way to avoid layering violation. Or what is the alternative to implement a block validation policy that is opaque to the type of the ordering service?

Perhaps more importantly, the existing proposal suggests pulling in a large volume of code …

The integration code of BFT library into Fabric is not a lot of code, most of the logic is hidden within SmartBFT, and all we suggest is to have it vendored as an external library.

… without the original authors nor anyone with a history of Fabric contributions committed to maintaining it. This is simply not the way good open source is done.

We might be missing a point here, but wasn’t the intent of the RFCs process to discuss and introduce large and significant changes into Fabric from the open source community? I would also expect that some of them might require new code or changes. We were also not aware that to open an RFC it’s mandatory to have a previous record of commits. The assumption was that we would be able to create constructive dialog and going to decide together on the work plan introduced in small and incremental steps. Moreover, as stated above we do want to provide support and maintenance for the new code going forward.

As I've mentioned before, I think the most productive avenue would be to bring the technical resources online to enumerate the problems which need to be solved, and to begin proposing concrete solutions -- this is exactly what the RFC process is all about.

By all means, let’s do it. We are more than happy to discuss within RFC technical issues/problems that you think have to be addressed and solved and work on a particular solution.

If we find that the asynchronous nature of github is problematic, then perhaps we can set up a call, but if we can avoid it, I think would be preferable.

Of course, let’s keep the conversation here and focus on the technical side of the RFC.

For HLF as technology and entire community purposes, it's worth having a working implementation BFT sooner rather than later.

@jyellick
Copy link
Copy Markdown

But your right this might be too early as evil is in the details and we better wait for design and documentation.

Definitely.

do you have some estimates on Mir completion and Fabric integration design?

The Mir library itself is nearing completion. Integration with Fabric has not been designed nor planned yet.

As I noted in our reply to @denyeart, we would be glad to take part and tackle debt in Fabric code base, while not sure why this is a prerequisite of acceptance for suggested RFC.

In your quotation, you broke up a paragraph that was meant to be read together, so I'd like to stress that it is a single issue, not two separate points. My original statement was "... certainly there would need be some structural changes, but likely they would be largely related to control flow, requiring very little actual code to accomplish. But, as @denyeart pointed out, there is a lot of debt...". My expectation is that attempting to perform the modifications to the orderer control flow as outlined will require addressing numerous points of technical debt.

I also feel that @denyeart and my suggestions for new contributors to first fix technical debt are being misconstrued. Addressing the debt is not a punitive item, nor some sort of toll to be paid to gain stake in the project. Certainly, there is a benefit to the project in addressing this debt, but first and foremost, the suggestion that a new contributor start by tackling some technical debt is about getting the new contributor familiar with the contribution process, including the coding guidelines, pull request guidelines, review process, CI infrastructure, and other aspects that a new contributor must be proficient with in order to meaningfully contribute to HLF.

The second half, and what I alluded to specifically in the just-cited paragraph, is that bringing new function to Fabric invariably can be done in either a way which generates new debt, or a way in which it clears away old debt; features proposed in the former are much less likely to receive support than the latter. To cite the example which began this whole thread, when the peer block replication was implemented in the SmartBFT fork, rather than cleanup the assorted debt around connection handling and authentication already present in Fabric (a hodepodge of singletons, unnecessary indirection, and odd wiring), the fork has extended it. I completely understand the expediency of this decision, but it is exactly this sort of additional incurred debt we want to avoid.

We might be missing a point here, but wasn’t the intent of the RFCs process to discuss and introduce large and significant changes into Fabric from the open source community? I would also expect that some of them might require new code or changes. We were also not aware that to open an RFC it’s mandatory to have a previous record of commits. The assumption was that we would be able to create constructive dialog and going to decide together on the work plan introduced in small and incremental steps. Moreover, as stated above we do want to provide support and maintenance for the new code going forward.

I'm concerned that the discussion here is becoming adversarial. As I indicated in my first replies, we are very happy that you and your team are interested in contributing to Fabric's development! There is no requirement to create an RFC, and we would love to have a constructive dialog about how your contributions can be integrated into Fabric.

With all that said, as has been pointed out a few times.

  1. This RFC contains far too little technical detail. There is a very high level description of how some components are modified, but the list is clearly not exhaustive, and for those changes listed are only briefly described. For instance, the RFC describes that peers pull a stream of bock headers only from multiple orderers, to enable them to lightly detect censorship. But, no where is it described how the peer retrieves this stream, presumably, it is via new options to the existing Deliver service, but those changes are not described anywhere.

  2. This RFC is much too big. Especially for a first time contributor, the scope and breadth covered by this RFC is daunting. To pick further upon the example I just gave, the block header delivery service would be a great first RFC. Enhancing the Deliver service to allow streaming of only signed block headers has general useful implications beyond BFT, and also makes sense as a building block for any future BFT ordering implementation. We could include enhancements to the peer CLI to exploit this new service, and test and implement it in a simple standalone way. And, speaking from experience operating Fabric, there have been numerous occasions where being able to stream only block headers and metadata would have been very useful related to IO constrained environments.

  3. The implementer(s) are new contributors. Like all things in life, becoming an effective HLF contributor takes time, effort, and practice. Certainly having a strong background in open source, golang, cryptography, and other skillsets can help a new contributor come up to speed more quickly, but even the most seasoned veteran is likely to take time to become productive. It's helpful to no one if a new contributor spends months working on a patch, only to have it rejected because they are failing to follow expected practices.

As written, there's no way I, nor I believe any other maintainer, could in good conscience agree to approve this RFC. I've put forth a proposal on how current SmartBFT users could more rapidly gain access to an unforked and up-to-date Fabric (by allowing the consensus implementation itself to bypass the RFC process and live out of tree), but it's just that, a proposal, not an edict. If you'd prefer to begin by tackling some other aspects while you build up to an RFC including the consensus plugin itself, that of course is another perfectly fine path.

So, as has been requested before, let's get the technical resources here to discuss what's actually needed on the path to BFT. Let's identify individual components, design them and open RFCs, let's implement them, and let's get them merged into master. Let's start small with something attainable, and let's get through that one.

Base automatically changed from master to main March 19, 2021 23:22
@yacovm
Copy link
Copy Markdown
Contributor

yacovm commented Sep 25, 2021

Hey @jyellick , I hope you're doing well.

I am resurrecting this thread in order to continue the discussion, so the community can hopefully come to a decision on our course of action.

In particular, I would like to discuss the concerns you've raised, namely about the contributors being new, technical debt, and also I'd like to raise a few concerns of my own, if I may.

First of, let's start with your concerns, and especially the ones I fully agree on the facts, but less so on the conclusion:

To cite the example which began this whole thread, when the peer block replication was implemented in the SmartBFT fork, rather than cleanup the assorted debt around connection handling and authentication already present in Fabric (a hodepodge of singletons, unnecessary indirection, and odd wiring), the fork has extended it. I completely understand the expediency of this decision, but it is exactly this sort of additional incurred debt we want to avoid.

You are 100% correct. I fully agree that the way the peer side deliver client was implemented in BFT, is bad. Exactly as you say, its implementation overlooks the fact that the existing code is already crummy as it is, and it tries to build on top of it. Clearly, the optimal strategy regarding that part, was to first to rewrite the deliver client in the peer entirely.
However, we are fortunate that any BFT feature will not be delivered into Fabric 1.4.x, and therefore - the way the BFT deliver client was implemented is irrelevant, as the 1.4 deliver client is obsolete and its code has been replaced (by you), therefore any implementation of a BFT deliver client would not have anything to do with the existing code in the BFT fork.

Now, let's talk your concern about the fact that the implementers are new contributors

The implementer(s) are new contributors. Like all things in life, becoming an effective HLF contributor takes time, effort, and practice. Certainly having a strong background in open source, golang, cryptography, and other skillsets can help a new contributor come up to speed more quickly, but even the most seasoned veteran is likely to take time to become productive. It's helpful to no one if a new contributor spends months working on a patch, only to have it rejected because they are failing to follow expected practices.

I don't think there is anyone that is more weary of new contributors than me, and I agree with you that it is indeed a concern.
However, there is a very simple solution to your concern.
What if I would lead, supervise, as well as code the integration of SmartBFT into Fabric? Surely as you know, I am in no sense a new contributor ;-)
I am sure that I will succeed, as it already happened (https://github.com/SmartBFT-Go/fabric/), so if integrating it to 1.4 was done successfully, there is no reason to think that a 2.4 based integration would have anything but a happy ending.

In respect to this concern, I think that a similar concern exists about Mir, is that not so? As far as I know, you are no longer developing the Mir library. Do you foresee coming back to finish it anytime soon, and expect to have substantial cycles to help its integration after it will be finished? Because if not, then don't we have a similar problem that you raised here, with Mir?

To the best of my knowledge, there is no living soul that currently knows well both the Mir code, as well as the Fabric code, but there are such for the SmartBFT case.

This RFC is much too big

I personally do not agree here, simply because of the fact that the existing SmartBFT integration into a fork of Fabric 1.4 was already successfully done, and anyone can witness this if they pull the docker images publicly available and give them a whirl. However, I think that doing what you propose below is not a bad idea:

I just gave, the block header delivery service would be a great first RFC. Enhancing the Deliver service to allow streaming of only signed block headers has general useful implications beyond BFT, and also makes sense as a building block for any future BFT ordering implementation. We could include enhancements to the peer CLI to exploit this new service, and test and implement it in a simple standalone way. And, speaking from experience operating Fabric, there have been numerous occasions where being able to stream only block headers and metadata would have been very useful related to IO constrained environments.

Maybe what we really need is not a single RFC, but several RFCs, each focusing on a self contained part of the integration. This leads me to your other concern which is somewhat related:

This RFC contains far too little technical detail. There is a very high level description of how some components are modified, but the list is clearly not exhaustive, and for those changes listed are only briefly described. For instance, the RFC describes that peers pull a stream of block headers only from multiple orderers, to enable them to lightly detect censorship. But, no where is it described how the peer retrieves this stream, presumably, it is via new options to the existing Deliver service, but those changes are not described anywhere.

I believe the reason the RFC contains far too little detail is that it is tedious to describe in an RFC so many details that anyone can look at if they just opened the code and saw how it is implemented.
I guess the people who wrote the RFC were just being too hasty to get it out in time. So, my opinion is that we might indeed benefit from a stack of several RFCs, each dealing with one thing.

Now I would like to discuss your inversion of control proposal, and then to raise some concerns of my own.

So, I would suggest we proceed as follows.

1. Put together a proposal for allowing consensus plugins to consume Fabric ordering as a library, to build Fabric compatible orderers.  After agreeing on an approach, and discussing some of the pitfalls, implement it.  It would not need to be perfect at first pass, but keeping in mind that the less debt that is addressed, and the less tight the API, the more pain downstream consumers will feel.  Since the SmartBFT consensus plugin would be the first consumer, it's a natural pairing for the developers maintaining SmartBFT to be the same ones developing and maintaining this external API.  Looking at the RFC as proposed here, I think this makes good sense, as the RFC explicitly notes the relatively small amounts of code required to glue the library to the consensus plugin API, so maintaining the orderer API consumer out of tree should not be overly burdensome.

2. Put together a proposal for the common BFT features that will need to be present in both the peers and orderers, and which necessarily cannot reside inside the consensus plugin.  This would include the block replication piece already mentioned, but might spill over into other pieces.  As the owners of the SmartBFT Fabric fork, hopefully you can bring some clarity as to what other non-plugin aspects are necessary.

3. Have the SmartBFT consensus implementation for Fabric be our first consumer of this new externally consumable consensus API.

Assuming an ordering service that is consumable as a library, there is almost nothing to be done to integrate SmartBFT in Fabric, since the SmartBFT library is already built in a manner that consumes Fabric as a set of interfaces. In fact, it would be mostly copying the code from the existing SmartBFT integration and then slightly change it to fit the API.

What I am concerned about is, that the existing ordering service, and especially aspects such as onboarding, channel participation, migration, channel lifecycle (registrar is a mess, as you know...) are currently hardwired for Raft, and if we are to make the ordering service consumable externally, you would need to find a way to "teach" it to parse the opaque consensus metadata.
However, am rather concerned that as part of the massive refactoring, we might leave the main branch in a state from which you cannot cut a release for a very long time, and even if we'd had other features we would want to release, we wouldn't be able to. Unless of course, and this is in my opinion a cleaner approach - we keep the current orderer code as is, with all its chains (held by duct tape to the registrar), and we build a "new" consumable orderer that exists in parallel, which uses self contained primitives such as the communication and replication packages in the common cluster part of the code. This would allow us to keep releasing Fabric versions, avoiding introducing bugs due to refactoring, while working on a new orderer lifecycle in parallel.

However, if we just build this new orderer infrastructure within the Fabric code, I am afraid we would never know we did it right if the only cluster type orderer we have, is Raft.
What would probably happen is that we will surely discover that we have made the wrong API and/or design choices and we need to further change it, in order for BFT to be able to consume it.

Therefore, here is what I propose:

  1. Put a proposal for allowing consensus plugins to consume Fabric, as you stated.
  2. Start building in the Fabric tree, in parallel(!):
    (a) A consumable ordering service, co-existing, but not reachable from within the orderer "main".
    (b) A SmartBFT orderer (in tree) that will consume (a). The reason I want to start with SmartBFT, is that it depends on a superset of capabilities that Raft depends on (i.e it also verifies signatures on blocks and of clients)
  3. Once (2) is done, we may remove the SmartBFT code (b) from within the Fabric core, retain (a) and make a new repository that under hyperledger that simply imports the Fabric orderer and contains (b).
  4. We make a Raft orderer that consumes (a), and after enough time, feedback, and testing, we retire the old orderer binary.

The "new" orderer will be fully interoperable with older orderers, so that externally a peer or a client wouldn't care if it is facing a new or an old orderer, and of course, a mixed cluster with new and old orderers will run smoothly (or at least, as smooth as it gets in Fabric...)

What do you think Jason and others (Dave, Oleg, etc.)?

@denyeart
Copy link
Copy Markdown
Contributor

Thanks for reviving this thread @yacovm . I think at a high level your proposal is sound and addresses the main points that have been raised previously. Let's see if others raise concerns, if not, I think it would make sense to next drill down on what the series of RFCs may look like. We could then retire this RFC and start posting/reviewing a new stack of smaller self-contained RFCs related to BFT preparation and implementation.

@C0rWin
Copy link
Copy Markdown
Contributor

C0rWin commented Sep 27, 2021

@denyeart @yacovm glad to hear this RFC coming back to life again, hopefully, will end up with Fabric having BFT based OSN. Having an inversion of control that will allow the creation of new consensus modules w/o affecting Fabric code, would be a significant improvement.

Though, I do think that instead of refactoring Fabric code and bringing changes into the main branch as suggested below:

Therefore, here is what I propose:
Put a proposal for allowing consensus plugins to consume Fabric, as you stated.
Start building in the Fabric tree, in parallel(!):
(a) A consumable ordering service, co-existing, but not reachable from within the orderer "main".
(b) A SmartBFT orderer (in tree) that will consume (a). The reason I want to start with SmartBFT, is that it depends on a superset of capabilities that Raft depends on (i.e it also verifies signatures on blocks and of clients)
Once (2) is done, we may remove the SmartBFT code (b) from within the Fabric core, retain (a) and make a new repository that under hyperledger that simply imports the Fabric orderer and contains (b).
We make a Raft orderer that consumes (a), and after enough time, feedback, and testing, we retire the old orderer binary.

we might consider moving OSN code into a stand-alone repository, because this will allow keeping destructive changes out of the main Fabric code, while refactoring parts of OSN preparing external APIs to be consumed later by the consensus library, thus not affecting Fabric release cycles.

Another part, which wasn't mentioned, but is worth to be noted is how to make the peer side (if needed) to be aware of the consensus type as it affects the way it replicates the block.

@yacovm
Copy link
Copy Markdown
Contributor

yacovm commented Sep 28, 2021

we might consider moving OSN code into a stand-alone repository, because this will allow keeping destructive changes out of the main Fabric code, while refactoring parts of OSN preparing external APIs to be consumed later by the consensus library, thus not affecting Fabric release cycles.

I don't like this idea from several reasons:

  1. I want to preserve the ability to easily backport bug fixes. It was hard enough to backport to 1.4 after the protobuf relocation, so this would completely eliminate automatic backporting.
  2. Most of the orderer code will not change at all. Most of the code that is going to be affected is in orderer/common/multichannel but a lot of the orderer code will remain intact (broadcast support, filters, communication, replication, channel participation, etc.) with minimal changes if at all.
  3. The strategy I envisioned is to have a "copy on write" code strategy and first create a SmartBFT orderer while retaining the Raft orderer as it is today, and then at a later stage, move the Raft orderer to the new form, so there won't be a need for intrusive changes - anywhere that requires intrusive changes, we will re-write but not replace.
  4. Even though we have two kinds of Fabric nodes, it is very convenient to be able to modify code in both repositories in the same commit, as they share quite a lot of code that is located in the Fabric core as well. Extracting the orderer would leave the Fabric core with the peer, but also with other stuff like the MSP, config transactions processing, protobuf utilities, the file ledger, and more.
  5. Unlike the SDK <--> Fabric core relationship, where the Fabric core is the ultimate source of truth regarding APIs, the peer and the orderer have a more coequal relationship, which stems from the holistic approach where each doesn't have any meaning without the other. Consequently, we'd want to be able to test changes in both of them in the same pull request, and if we split the two, it would create an odd circular dependency where a breaking change in the orderer would not pass CI because the peer cannot be updated, and vice versa.
  6. This will not affect release cycles, as I explain in my comment - "This would allow us to keep releasing Fabric versions, avoiding introducing bugs due to refactoring, while working on a new orderer lifecycle in parallel."

Another part, which wasn't mentioned, but is worth to be noted is how to make the peer side (if needed) to be aware of the consensus type as it affects the way it replicates the block.

The peer doesn't really care what consensus protocol is running in the ordering service. All it cares is how many signatures it needs to verify. Recall, that you can create a policy that only cares about MSP IDs, and not about the actual certificates of these orderers (this is different from what was done in SmartBFT, and more flexible). It remains to choose whether this policy is created dynamically or needs to be altered each time, but in general this is not something that should depend on the opaque consensus metadata, as we have "orderer organizations" divided by MSP IDs already.

@C0rWin
Copy link
Copy Markdown
Contributor

C0rWin commented Sep 28, 2021

I don't like this idea from several reasons:

I want to preserve the ability to easily backport bug fixes. It was hard enough to backport to 1.4 after the protobuf relocation, so this would completely eliminate automatic backporting.

While I agree this is a very valid concern, but do you really envision a lot of backporting here?

Most of the orderer code will not change at all. Most of the code that is going to be affected is in orderer/common/multichannel but a lot of the orderer code will remain intact (broadcast support, filters, communication, replication, channel participation, etc.) with minimal changes if at all.

If I understand your proposal correctly, there will be small (or maybe not that small?) changes to other places to make them consumable by externalized consensus plugin, of course there is a need to take a look deeper, but I do not think that orderer/common/multichannel is the only place in OSN that will be affected.

The strategy I envisioned is to have a "copy on write" code strategy and first create a SmartBFT orderer while retaining the Raft orderer as it is today, and then at a later stage, move the Raft orderer to the new form, so there won't be a need for intrusive changes - anywhere that requires intrusive changes, we will re-write but not replace.
Even though we have two kinds of Fabric nodes, it is very convenient to be able to modify code in both repositories in the same commit, as they share quite a lot of code that is located in the Fabric core as well. Extracting the orderer would leave the Fabric core with the peer, but also with other stuff like the MSP, config transactions processing, protobuf utilities, the file ledger, and more.

Well, this is might be a separate discussion, but IMO compartmentalization of Fabric modules rather than having a monolith worth giving a thought.

Unlike the SDK <--> Fabric core relationship, where the Fabric core is the ultimate source of truth regarding APIs, the peer and the orderer have a more coequal relationship, which stems from the holistic approach where each doesn't have any meaning without the other. Consequently, we'd want to be able to test changes in both of them in the same pull request, and if we split the two, it would create an odd circular dependency where a breaking change in the orderer would not pass CI because the peer cannot be updated, and vice versa.
This will not affect release cycles, as I explain in my comment - "This would allow us to keep releasing Fabric versions, avoiding introducing bugs due to refactoring, while working on a new orderer lifecycle in parallel."

I got your proposal, but I was concerned about having within codebase not related to releases pieces of code with work on in-progress status, despite the fact that they might not be reachable from the mains.

Anyhow, do not get me wrong, I do think that your suggestion is a good compromise to eventually allow BFT based ordering service and in fact, anyone would be able to craft OSN with the consensus of their choice w/o affecting the Fabric codebase itself.

@denyeart
Copy link
Copy Markdown
Contributor

Yacov's proposal for ordering service node refactor for BFT is available as RFC #50 and supersedes this original proposal, therefore I'll close it and ask that the submitters add review comments to #50 instead.

@denyeart denyeart closed this May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants