add 006-bft-based-ordering-service.md#33
Conversation
Signed-off-by: Фёдор Партанский <partanskiy.f@n-t.io>
|
The biggest inhibitor to pulling any sort of new code into Fabric is the maintenance overhead. It's quite notable that the Fabric fork utilizing this BFT library is still based on the v1.4.x and has not kept up with Fabric development (now at v2.2.x+). In particular, the component pieces unrelated to the actual consensus plugin -- performing BFT resilient block pulling in the peer, and BFT replication of blocks in the orderer cluster component -- would be very non-trivial to forward port at this point because these code paths have seen substantial changes. Who is the technical sponsor who plans to forward port these changes, and maintain them, as well as the consensus plugin? |
|
Good to know a new BFT orderer implementation. |
|
Thank you for your comments. Autonomous Noncommercial Organization “ID&AS: Inter-Disciplinary & Advanced Studies Center” or “IDEAS Center” is the Sponsor of SmartBFT development. IDEAS is managing Smart BFT additional development at the moment and will be responsible for the following support and new Fabric versions compatibility. |
|
IDEAS is going to release the forwarding port for v2.2.0 and our team will be happy to collaborate, contribute, and receive recommendations. |
|
@OlegMartian it's great to hear about additional development interest from your team, we are of course always happy to have more contributors. Typically, RFCs are shepherded by an active maintainer who has demonstrated a history of quality contributions and long term commitment to the project. Perhaps it would make the most sense to begin by addressing some of the technical debt around block replication before attempting to tackle such a large new feature. For instance, creating a unified BFT-ready block replicator for use both in the orderer cluster block replication and standard peer replication would be a good place to start and be a necessary step towards any BFT ordering implementation. |
|
@jyellick thank you for the recommendation. You have raised a very valid concern, there is definitely a need to implement BFT resistant peer side block replicator, and as pointed out in this RFC the intra cluster communication provided for Raft exhibits such property. Moreover, we believe this work is indistinguishable and has to be completed along the way with the implementation of BFT based ordering service. We do agree that such functionality has to be implemented and provided as self-contained piece work in the context of this RFC. Currently, we are looking to hear more feedback and input from HLF community, especially Hyperledger Fabric Maintainers to formulate a reasonable integration strategy and to identify a roadmap and milestones to generate small self containable deliverables and provide ongoing support. |
|
@OlegMartian Perhaps at this point it would be helpful for the Fabric developers who will be performing the integration work to join the discussion. It is always exciting to bring new function to a project, but it's important that we do so in a maintainable way. The RFC, as written, is essentially an explanation of the approach taken in the cited Fabric forks. Since the Fabric code base has already diverged significantly since those forks, there remain significant open questions of what things would look like on top of master. Who are the technical contributors who are interested in discussing how to design and implement this new work? |
|
@jyellick thanks for your point, I think it’s a good suggestion, let us add more details into RFC to facilitate the technical discussion exactly as you mentioned. |
|
This is an impressive piece of work. I would however echo some of Jason's sentiments before taking the next steps:
|
|
I'd like to propose that we either open a new RFC(s), or rework this RFC to address the underlying technical debt and common BFT related features necessary for any BFT ordering implementation. Allow me to elaborate. The orderer has always been designed to accommodate different consensus plugins. Fabric defines a 'Consensus API' in the However, we still have the problem that consensus plugins must reside in tree, there is no way to dynamically inject them. This problem is both simple to fix, and carries with it a large number of complications. Most obviously, we could provide an exported function in the Fabric tree which allows an external consumer to build a Fabric compatible orderer binary leveraging the rest of the Fabric codebase as a library. This is I think the right way for us to enable consensus plugins, but, as I said, there is also some baggage. Fabric is not structured as a library, and because there is such a large exported API surface, changes in parts of the code (like channel config interfaces) which seem benign, could break downstream consumers (in this case consensus plugins). Additionally, there is the common functionality, such as BFT block replication which simply does not exist within the orderer codebase which will need to be implemented, this would be across peers and orderers. So, I would suggest we proceed as follows.
|
|
Dear @denyeart , thank you for the proposed steps. Can you help with some information?
We understand that HLF needs some bug fixing in the ordering service common code, Raft Integration, and other causes of test flakes. Our team is ready to collaborate and gets our hands wet working on the HLF tech debt items. Block replication thought, might special treatment. As I mentioned above we believe this work is indistinguishable and has to be completed along the way with the implementation of BFT based ordering service. Can we have a look at the list of technical debt items to estimate what we are able to contribute?
I agree that engagement with HLF Project makes significant sense as for Smart BFT integration so for proving the ability to develop and support the solution. Since we started communication with the community very recently it is fair that we have not done a lot yet. Our representative will be attending contributor meetings very soon. We would like to show a presentation of the solution in one of the maintainer's meeting. Could you assist to arrange this activity, please?
We have got a suggestion from Linux Foundation representatives to work with Artem Barger ( @C0rWin ) who can help us going forward. We are going to be in touch with him to check on the next steps.
We believe that there are different types of needs from projects. Some of them might require WAN level scalability which can be met by Mir BFT in the future. Whereas others could be resolved with SmartBFT since now. From exploring MirBFT a bit it seems that in order to support such a framework, it might require significant changes in the way Fabric Ordering service handles transactions, given that blocks are will be produced in parallel by multiple leaders. We were wondering whenever do you have an upgraded architecture design and Roadmap for Mir BFT integration into Fabric? Moreover, given the architecture of Fabric aims to support a pluggable consensus mechanism, I think having several BFT flavours will only benefit the project. |
|
Hi @jyellick . Your proposal is seems about the right direction, but it also feels rather ambitious and assumes complete refactoring of the Fabric code base, at least parts that relevant to Ordering Service. From the one side, this change sounds reasonable, but from the other side, it feels that it requires careful design and planning which I think is very good in a long term. There are a lot of question marks on how such pluggable or composable solution is expected to be implemented, for instance that implies that entire stack of supporting tools has to be reimplemented or introduced to support it, also it’s not very obvious how configuration or notion of BFT based ordering service should be propagated to the peers network to conduct block validation policy. Where implementation of block validation policy with such approach is also not very clear, at least not at the current moment. From other side, current RFC suggest integration of already implemented and BFT library, which is highly Fabric centric, completely non intrusive and could be achieved quite easily in feasible future. I think that we would be glad to join the design of pluggable composable solution as you suggested, while executing it regardless to the current RFC. Can we have a call with you to align on the proposal itself and determine the following common steps? |
I don't want to derail the conversation on the topic at hand, but this isn't an accurate description of how Mir would likely be integrated. Most likely, Mir would simply be utilized as a BFT transaction stream, which could then be deterministically cut into blocks (very similar to how the original Kafka implementation worked). There is no Mir RFC at the moment, so it is probably not worth getting too bogged down in, but to say it would require significant changes to the way Fabric ordering handles transactions is likely not true.
As noted above, I actually don't think there's a ton of modifications to be made to Fabric to accommodate Mir specifically. There is obviously some consensus plugin work to be done, but the challenges land largely around the same topics SmartBFT encountered.
The actual challenge of integrating a BFT consensus algorithm for ordering is fairly straightforward with those problems solved. With that said, these are all open questions, and there is currently no documentation, RFC, etc. which describes them.
If there is a need and a demand for multiple BFT implementations, then by all means, let's do so. But, there is a maintenance cost and a cognitive cost associated with all things.
I'm not sure that "complete refactoring" is fair here, certainly there would need be some structural changes, but likely they would be largely related to control flow, requiring very little actual code to accomplish. But as @denyeart pointed out, there is a lot of debt in the Fabric code-base to be addressed, and this would be a great way for your team to start contributing.
Absolutely, these are all excellent questions about how Fabric wants to expose BFT ordering to the other network components. My understanding of the existing SmartBFT implementation is that intimate knowledge of the consensus plugin's configuration is used to compute things like the block validation policy, and to decide what nodes to pull blocks from. My reaction to this is that it is a layering violation, and that this is the wrong approach. I understand that it simplifies usability, but it also goes contrary to the idea of an ordering service being a service whose internal workings can and should be treated opaquely. As Fabric has matured, we've seen consensus plugin details move from specific uniquely keyed properties in the channel configuration and orderer yamls towards a more generic notion of consensus data. For supporting tools like configtxgen, perhaps we need to extend this pattern. Or, maybe we have become overly generic at the cost of usability and the approach taken in SmartBFT is superior, or perhaps there is another third option which gets us the benefits of both. These are exactly the sorts of concerns and solutions I'd like to see articulated in an RFC and discussed there among the stakeholders.
This doesn't make sense to me. Just above, you suggested that allowing the consensus plugin to consume Fabric to produce an orderer binary out of tree was not feasible because of how highly intrusive the consensus plugin was in other parts of the system (such as requiring modifications to configtxgen, to ignoring and overriding the block validation policy in the peer, etc.). So, to here claim that it is "completely non intrusive" doesn't seem consistent. Perhaps more importantly, the existing proposal suggests pulling in a large volume of code without the original authors nor anyone with a history of Fabric contributions committed to maintaining it. This is simply not the way good open source is done. The hope of allowing consensus plugins to consume Fabric externally would allow current users of the SmartBFT Fabric fork to move up to a modern version of Fabric, while decreasing the maintenance burden for the SmartBFT code. At the same time, we eliminate some outstanding technical debt and generally lower the barrier for new consensus implementations, BFT or otherwise.
By and large, as an open source community, we try to do our planning and collaboration via avenues like github and the mailing list, so that different timezones and schedules do not become a barrier for participation, and it provides a natural record to those who might come later looking to understand the reasons behind why different decisions were made. We do of course have the weekly community call, though typically this is reserved more for demonstrations and status than for technical discussions. As I've mentioned before, I think the most productive avenue would be to bring the technical resources online to enumerate the problems which need to be solved, and to begin proposing concrete solutions -- this is exactly what the RFC process is all about. If we find that the asynchronous nature of github is problematic, then perhaps we can set up a call, but if we can avoid it, I think would be preferable. |
From our understanding from the reading MIR paper, there are some intrinsics of the protocol that are likely to require some modifications, for instance, given that blocks going to be produced in parallel that means hash chaining orchestrated orthogonal to batching, also that means there is should be some signing mechanism, of course, some of these might be encapsulated internally into Mir. There are additional concerns of how configuration updates will be supported and from our recollection of Mir, there is an assumption that clients' identities are known, which is not the case for the current implementation of ordering service nodes. But your right this might be too early as evil is in the details and we better wait for design and documentation.
There is definitely a need for having at least one implementation, do you have some estimates on Mir completion and Fabric integration design?
Well, I think it might be a bit early since it’s something yet to be designed, however, we do agree that some changes are needed. Whenever it’s some structural changes or a bit beyond that is subject to architecture and design which doesn’t exist yet.
As I noted in our reply to @denyeart, we would be glad to take part and tackle debt in Fabric code base, while not sure why this is a prerequisite of acceptance for suggested RFC.
These are all good and very valid points, as finally brings us to the technical discussion plane. We have detailed all architectural and design aspects in RFC itself, could you please provide specific comments with respect to concerns you have raised? In particular, it’s very interesting to hear your thoughts about what should be the proper way to avoid layering violation. Or what is the alternative to implement a block validation policy that is opaque to the type of the ordering service?
The integration code of BFT library into Fabric is not a lot of code, most of the logic is hidden within SmartBFT, and all we suggest is to have it vendored as an external library.
We might be missing a point here, but wasn’t the intent of the RFCs process to discuss and introduce large and significant changes into Fabric from the open source community? I would also expect that some of them might require new code or changes. We were also not aware that to open an RFC it’s mandatory to have a previous record of commits. The assumption was that we would be able to create constructive dialog and going to decide together on the work plan introduced in small and incremental steps. Moreover, as stated above we do want to provide support and maintenance for the new code going forward.
By all means, let’s do it. We are more than happy to discuss within RFC technical issues/problems that you think have to be addressed and solved and work on a particular solution.
Of course, let’s keep the conversation here and focus on the technical side of the RFC. For HLF as technology and entire community purposes, it's worth having a working implementation BFT sooner rather than later. |
Definitely.
The Mir library itself is nearing completion. Integration with Fabric has not been designed nor planned yet.
In your quotation, you broke up a paragraph that was meant to be read together, so I'd like to stress that it is a single issue, not two separate points. My original statement was "... certainly there would need be some structural changes, but likely they would be largely related to control flow, requiring very little actual code to accomplish. But, as @denyeart pointed out, there is a lot of debt...". My expectation is that attempting to perform the modifications to the orderer control flow as outlined will require addressing numerous points of technical debt. I also feel that @denyeart and my suggestions for new contributors to first fix technical debt are being misconstrued. Addressing the debt is not a punitive item, nor some sort of toll to be paid to gain stake in the project. Certainly, there is a benefit to the project in addressing this debt, but first and foremost, the suggestion that a new contributor start by tackling some technical debt is about getting the new contributor familiar with the contribution process, including the coding guidelines, pull request guidelines, review process, CI infrastructure, and other aspects that a new contributor must be proficient with in order to meaningfully contribute to HLF. The second half, and what I alluded to specifically in the just-cited paragraph, is that bringing new function to Fabric invariably can be done in either a way which generates new debt, or a way in which it clears away old debt; features proposed in the former are much less likely to receive support than the latter. To cite the example which began this whole thread, when the peer block replication was implemented in the SmartBFT fork, rather than cleanup the assorted debt around connection handling and authentication already present in Fabric (a hodepodge of singletons, unnecessary indirection, and odd wiring), the fork has extended it. I completely understand the expediency of this decision, but it is exactly this sort of additional incurred debt we want to avoid.
I'm concerned that the discussion here is becoming adversarial. As I indicated in my first replies, we are very happy that you and your team are interested in contributing to Fabric's development! There is no requirement to create an RFC, and we would love to have a constructive dialog about how your contributions can be integrated into Fabric. With all that said, as has been pointed out a few times.
As written, there's no way I, nor I believe any other maintainer, could in good conscience agree to approve this RFC. I've put forth a proposal on how current SmartBFT users could more rapidly gain access to an unforked and up-to-date Fabric (by allowing the consensus implementation itself to bypass the RFC process and live out of tree), but it's just that, a proposal, not an edict. If you'd prefer to begin by tackling some other aspects while you build up to an RFC including the consensus plugin itself, that of course is another perfectly fine path. So, as has been requested before, let's get the technical resources here to discuss what's actually needed on the path to BFT. Let's identify individual components, design them and open RFCs, let's implement them, and let's get them merged into master. Let's start small with something attainable, and let's get through that one. |
|
Hey @jyellick , I hope you're doing well. I am resurrecting this thread in order to continue the discussion, so the community can hopefully come to a decision on our course of action. In particular, I would like to discuss the concerns you've raised, namely about the contributors being new, technical debt, and also I'd like to raise a few concerns of my own, if I may. First of, let's start with your concerns, and especially the ones I fully agree on the facts, but less so on the conclusion:
You are 100% correct. I fully agree that the way the peer side deliver client was implemented in BFT, is bad. Exactly as you say, its implementation overlooks the fact that the existing code is already crummy as it is, and it tries to build on top of it. Clearly, the optimal strategy regarding that part, was to first to rewrite the deliver client in the peer entirely. Now, let's talk your concern about the fact that the implementers are new contributors
I don't think there is anyone that is more weary of new contributors than me, and I agree with you that it is indeed a concern. In respect to this concern, I think that a similar concern exists about Mir, is that not so? As far as I know, you are no longer developing the Mir library. Do you foresee coming back to finish it anytime soon, and expect to have substantial cycles to help its integration after it will be finished? Because if not, then don't we have a similar problem that you raised here, with Mir? To the best of my knowledge, there is no living soul that currently knows well both the Mir code, as well as the Fabric code, but there are such for the SmartBFT case.
I personally do not agree here, simply because of the fact that the existing SmartBFT integration into a fork of Fabric 1.4 was already successfully done, and anyone can witness this if they pull the docker images publicly available and give them a whirl. However, I think that doing what you propose below is not a bad idea:
Maybe what we really need is not a single RFC, but several RFCs, each focusing on a self contained part of the integration. This leads me to your other concern which is somewhat related:
I believe the reason the RFC contains far too little detail is that it is tedious to describe in an RFC so many details that anyone can look at if they just opened the code and saw how it is implemented. Now I would like to discuss your inversion of control proposal, and then to raise some concerns of my own.
Assuming an ordering service that is consumable as a library, there is almost nothing to be done to integrate SmartBFT in Fabric, since the SmartBFT library is already built in a manner that consumes Fabric as a set of interfaces. In fact, it would be mostly copying the code from the existing SmartBFT integration and then slightly change it to fit the API. What I am concerned about is, that the existing ordering service, and especially aspects such as onboarding, channel participation, migration, channel lifecycle (registrar is a mess, as you know...) are currently hardwired for Raft, and if we are to make the ordering service consumable externally, you would need to find a way to "teach" it to parse the opaque consensus metadata. However, if we just build this new orderer infrastructure within the Fabric code, I am afraid we would never know we did it right if the only cluster type orderer we have, is Raft. Therefore, here is what I propose:
The "new" orderer will be fully interoperable with older orderers, so that externally a peer or a client wouldn't care if it is facing a new or an old orderer, and of course, a mixed cluster with new and old orderers will run smoothly (or at least, as smooth as it gets in Fabric...) What do you think Jason and others (Dave, Oleg, etc.)? |
|
Thanks for reviving this thread @yacovm . I think at a high level your proposal is sound and addresses the main points that have been raised previously. Let's see if others raise concerns, if not, I think it would make sense to next drill down on what the series of RFCs may look like. We could then retire this RFC and start posting/reviewing a new stack of smaller self-contained RFCs related to BFT preparation and implementation. |
|
@denyeart @yacovm glad to hear this RFC coming back to life again, hopefully, will end up with Fabric having BFT based OSN. Having an inversion of control that will allow the creation of new consensus modules w/o affecting Fabric code, would be a significant improvement. Though, I do think that instead of refactoring Fabric code and bringing changes into the main branch as suggested below:
we might consider moving OSN code into a stand-alone repository, because this will allow keeping destructive changes out of the main Fabric code, while refactoring parts of OSN preparing external APIs to be consumed later by the consensus library, thus not affecting Fabric release cycles. Another part, which wasn't mentioned, but is worth to be noted is how to make the peer side (if needed) to be aware of the consensus type as it affects the way it replicates the block. |
I don't like this idea from several reasons:
The peer doesn't really care what consensus protocol is running in the ordering service. All it cares is how many signatures it needs to verify. Recall, that you can create a policy that only cares about MSP IDs, and not about the actual certificates of these orderers (this is different from what was done in SmartBFT, and more flexible). It remains to choose whether this policy is created dynamically or needs to be altered each time, but in general this is not something that should depend on the opaque consensus metadata, as we have "orderer organizations" divided by MSP IDs already. |
While I agree this is a very valid concern, but do you really envision a lot of backporting here?
If I understand your proposal correctly, there will be small (or maybe not that small?) changes to other places to make them consumable by externalized consensus plugin, of course there is a need to take a look deeper, but I do not think that orderer/common/multichannel is the only place in OSN that will be affected.
Well, this is might be a separate discussion, but IMO compartmentalization of Fabric modules rather than having a monolith worth giving a thought.
I got your proposal, but I was concerned about having within codebase not related to releases pieces of code with work on in-progress status, despite the fact that they might not be reachable from the mains. Anyhow, do not get me wrong, I do think that your suggestion is a good compromise to eventually allow BFT based ordering service and in fact, anyone would be able to craft OSN with the consensus of their choice w/o affecting the Fabric codebase itself. |
No description provided.