-
Notifications
You must be signed in to change notification settings - Fork 700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make SyncingStrategy
abstract and allow developers to customize it
#5333
Comments
@dmitry-markin @skunert any opinion here? |
Thanks for bringing this up! Actually, I was also thinking about composing syncing strategies instead of hardcoding transitions in One thing to keep in mind is that some strategies should be able to run in parallel. This is the case for gap sync we have now (it was planned we extract it from Another thing that should be dealt with along the way is that currently the strategies use custom actions to communicate with protocols. This should be replaced with a possibility for a strategy to communicate with "its" protocols using opaque requests/responses and subscribing to opaque notifications. Your proposed solution seems sound to me. As for Overall, this is definitely the thing that should be implemented. |
Okay, so we are thinking in about the same direction then, which is good. The first issue I see for composition is that everything is tied together through layers instead of being somewhat independent. Specifically different strategies already have their own implementations, but the actions they produce are proxied through What I think would solve that is to have each strategy their own Essentially individual strategies should become more or less opaque to the caller, caller should understand their internal operation or forward their requests/responses back and forth. This is essentially how our custom strategies in Subspace protocol work, they own handles to their dependencies and interact with networking or other things as they see fit without Substrate having any idea about it. We'd be able to wrap those in structs and implement strategy trait in the future. Does it make sense? |
Let me think if we can make strategies completely opaque. May be we will need to manage strategies using the same protocol in some smart way. For example, if different strategies request the same blocks via block request protocol, the peer will be banned. We also tried to issue max 1 in-flight request at a time to every peer at some point, but I haven't found this being enforced now. Probably we lifted this constraint, and don't need to take it into account anymore. Do you have any thoughts what else (except |
Methods like Not yet sure if there needs to be some more direct communication between lower and higher level strategies, but at least on the top level I don't think too many are necessary. Well, I guess it might be more like spawning a worker for strategy and having a handle to interact with it, kind of like |
I'm also interested in this issue regarding the potential implementation of a PoW chain using Substrate. Since there's no finalized state in a PoW context, the existing fast sync mechanism may not work effectively. However, if an abstraction of |
I have tried different approaches and found something that is somewhat decent for the first step. After #5410 I have another step before the actual abstraction. New trait is introduced and initially implemented for both There is still plenty of things to untangle, but I need to get reviews and submit things in sequence, so will appreciate reasonably quick turnaround with PRs if possible. |
This PR untangles syncing metrics and makes them reactive, the way metrics are supposed to be in general. Syncing metrics were bundled in a way that caused coupling across multiple layers: justifications metrics were defined and managed by `ChainSync`, but only updated periodically on tick in `SyncingEngine`, while actual values were queried from `ExtraRequests`. This convoluted architecture was hard to follow when I was looking into #5333. Now metrics that correspond to each component are owned by that component and updated as changes are made instead of on tick every 1100ms. This does add some annoying boilerplate that is a bit harder to maintain, but it separates metrics more nicely and if someone queries them more frequently will give arbitrary resolution. Since metrics updates are just atomic operations I do not expect any performance impact of these changes. Will add prdoc if changes look good otherwise. P.S. I noticed that importing requests (and corresponding metrics) were not cleared ever since corresponding code was introduced in dc41558#r145518721 and I left it as is to not change the behavior, but it might be something worth fixing. cc @dmitry-markin --------- Co-authored-by: Dmitry Markin <dmitry@markin.tech>
As I was looking at the coupling between `SyncingEngine`, `SyncingStrategy` and individual strategies I noticed a few things that were unused, redundant or awkward. The awkward change comes from paritytech/substrate#13700 where `num_connected_peers` property was added to `SyncStatus` struct just so it can be rendered in the informer. While convenient, the property didn't really belong there and was annoyingly set to `0` in some strategies and to `num_peers` in others. I have replaced that with a property on `SyncingService` that already stored necessary information internally. Also `ExtendedPeerInfo` didn't have a working `Clone` implementation due to lack of perfect derive in Rust and while I ended up not using it in the refactoring, I included fixed implementation for it in this PR anyway. While these changes are not strictly necessary for #5333, they do reduce coupling of syncing engine with syncing strategy, which I thought is a good thing. Reviewing individual commits will be the easiest as usual. --------- Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Another PR with initial |
Have another set of changes that allows to use custom syncing strategies on top of #5469 that I can't open as a PR before #5469 is in. With that The next step will be to make |
I think I finally found a good pattern to remove block/state/warp requests/responses handling from |
This feature is helpful for us with custom sync protocol that is similar to Warp sync except we do not ever sync the gap and don't want it to exist in the first place (see #5333 and its references for motivation). Otherwise we had to resort to this: autonomys@d537512 --------- Co-authored-by: Davide Galassi <davxy@datawok.net>
This is a step towards #5333 The commits with code moving (but no other changes) and actual changes are separated for easier review. Essentially this results in `SyncingStrategy` trait replacing struct (which is renamed to `PolkadotSyncingStrategy`, open for better name suggestions). Technically it is already possible to replace `PolkadotSyncingStrategy<B, Client>` with `Box<dyn SyncingStrategy<B>` in syncing engine, but I decided to postpone such change until we actually have an ability to customize it. It should also be possible to swap `PolkadotSyncingStrategy` with just `ChainSync` that only supports regular full sync from genesis (it also implements `SyncingStrategy` trait). While extracted trait still has a lot of non-generic stuff in it like exposed knowledge of warp sync and `StrategyKey` with hardcoded set of options, I believe this is a step in the right direction and will address those in upcoming PRs. With #5431 that landed earlier warp sync configuration is more straightforward, but there are still numerous things interleaved that will take some time to abstract away nicely and expose in network config for developers. For now this is an internal change even though data structures are technically public and require major version bump.
This is a step towards paritytech#5333 The commits with code moving (but no other changes) and actual changes are separated for easier review. Essentially this results in `SyncingStrategy` trait replacing struct (which is renamed to `PolkadotSyncingStrategy`, open for better name suggestions). Technically it is already possible to replace `PolkadotSyncingStrategy<B, Client>` with `Box<dyn SyncingStrategy<B>` in syncing engine, but I decided to postpone such change until we actually have an ability to customize it. It should also be possible to swap `PolkadotSyncingStrategy` with just `ChainSync` that only supports regular full sync from genesis (it also implements `SyncingStrategy` trait). While extracted trait still has a lot of non-generic stuff in it like exposed knowledge of warp sync and `StrategyKey` with hardcoded set of options, I believe this is a step in the right direction and will address those in upcoming PRs. With paritytech#5431 that landed earlier warp sync configuration is more straightforward, but there are still numerous things interleaved that will take some time to abstract away nicely and expose in network config for developers. For now this is an internal change even though data structures are technically public and require major version bump.
This is a step towards paritytech#5333 The commits with code moving (but no other changes) and actual changes are separated for easier review. Essentially this results in `SyncingStrategy` trait replacing struct (which is renamed to `PolkadotSyncingStrategy`, open for better name suggestions). Technically it is already possible to replace `PolkadotSyncingStrategy<B, Client>` with `Box<dyn SyncingStrategy<B>` in syncing engine, but I decided to postpone such change until we actually have an ability to customize it. It should also be possible to swap `PolkadotSyncingStrategy` with just `ChainSync` that only supports regular full sync from genesis (it also implements `SyncingStrategy` trait). While extracted trait still has a lot of non-generic stuff in it like exposed knowledge of warp sync and `StrategyKey` with hardcoded set of options, I believe this is a step in the right direction and will address those in upcoming PRs. With paritytech#5431 that landed earlier warp sync configuration is more straightforward, but there are still numerous things interleaved that will take some time to abstract away nicely and expose in network config for developers. For now this is an internal change even though data structures are technically public and require major version bump.
This is a step towards paritytech#5333 The commits with code moving (but no other changes) and actual changes are separated for easier review. Essentially this results in `SyncingStrategy` trait replacing struct (which is renamed to `PolkadotSyncingStrategy`, open for better name suggestions). Technically it is already possible to replace `PolkadotSyncingStrategy<B, Client>` with `Box<dyn SyncingStrategy<B>` in syncing engine, but I decided to postpone such change until we actually have an ability to customize it. It should also be possible to swap `PolkadotSyncingStrategy` with just `ChainSync` that only supports regular full sync from genesis (it also implements `SyncingStrategy` trait). While extracted trait still has a lot of non-generic stuff in it like exposed knowledge of warp sync and `StrategyKey` with hardcoded set of options, I believe this is a step in the right direction and will address those in upcoming PRs. With paritytech#5431 that landed earlier warp sync configuration is more straightforward, but there are still numerous things interleaved that will take some time to abstract away nicely and expose in network config for developers. For now this is an internal change even though data structures are technically public and require major version bump.
This is a step towards paritytech#5333 The commits with code moving (but no other changes) and actual changes are separated for easier review. Essentially this results in `SyncingStrategy` trait replacing struct (which is renamed to `PolkadotSyncingStrategy`, open for better name suggestions). Technically it is already possible to replace `PolkadotSyncingStrategy<B, Client>` with `Box<dyn SyncingStrategy<B>` in syncing engine, but I decided to postpone such change until we actually have an ability to customize it. It should also be possible to swap `PolkadotSyncingStrategy` with just `ChainSync` that only supports regular full sync from genesis (it also implements `SyncingStrategy` trait). While extracted trait still has a lot of non-generic stuff in it like exposed knowledge of warp sync and `StrategyKey` with hardcoded set of options, I believe this is a step in the right direction and will address those in upcoming PRs. With paritytech#5431 that landed earlier warp sync configuration is more straightforward, but there are still numerous things interleaved that will take some time to abstract away nicely and expose in network config for developers. For now this is an internal change even though data structures are technically public and require major version bump.
This is a step towards paritytech#5333 The commits with code moving (but no other changes) and actual changes are separated for easier review. Essentially this results in `SyncingStrategy` trait replacing struct (which is renamed to `PolkadotSyncingStrategy`, open for better name suggestions). Technically it is already possible to replace `PolkadotSyncingStrategy<B, Client>` with `Box<dyn SyncingStrategy<B>` in syncing engine, but I decided to postpone such change until we actually have an ability to customize it. It should also be possible to swap `PolkadotSyncingStrategy` with just `ChainSync` that only supports regular full sync from genesis (it also implements `SyncingStrategy` trait). While extracted trait still has a lot of non-generic stuff in it like exposed knowledge of warp sync and `StrategyKey` with hardcoded set of options, I believe this is a step in the right direction and will address those in upcoming PRs. With paritytech#5431 that landed earlier warp sync configuration is more straightforward, but there are still numerous things interleaved that will take some time to abstract away nicely and expose in network config for developers. For now this is an internal change even though data structures are technically public and require major version bump.
This is a step towards paritytech#5333 The commits with code moving (but no other changes) and actual changes are separated for easier review. Essentially this results in `SyncingStrategy` trait replacing struct (which is renamed to `PolkadotSyncingStrategy`, open for better name suggestions). Technically it is already possible to replace `PolkadotSyncingStrategy<B, Client>` with `Box<dyn SyncingStrategy<B>` in syncing engine, but I decided to postpone such change until we actually have an ability to customize it. It should also be possible to swap `PolkadotSyncingStrategy` with just `ChainSync` that only supports regular full sync from genesis (it also implements `SyncingStrategy` trait). While extracted trait still has a lot of non-generic stuff in it like exposed knowledge of warp sync and `StrategyKey` with hardcoded set of options, I believe this is a step in the right direction and will address those in upcoming PRs. With paritytech#5431 that landed earlier warp sync configuration is more straightforward, but there are still numerous things interleaved that will take some time to abstract away nicely and expose in network config for developers. For now this is an internal change even though data structures are technically public and require major version bump.
This is a step towards paritytech#5333 The commits with code moving (but no other changes) and actual changes are separated for easier review. Essentially this results in `SyncingStrategy` trait replacing struct (which is renamed to `PolkadotSyncingStrategy`, open for better name suggestions). Technically it is already possible to replace `PolkadotSyncingStrategy<B, Client>` with `Box<dyn SyncingStrategy<B>` in syncing engine, but I decided to postpone such change until we actually have an ability to customize it. It should also be possible to swap `PolkadotSyncingStrategy` with just `ChainSync` that only supports regular full sync from genesis (it also implements `SyncingStrategy` trait). While extracted trait still has a lot of non-generic stuff in it like exposed knowledge of warp sync and `StrategyKey` with hardcoded set of options, I believe this is a step in the right direction and will address those in upcoming PRs. With paritytech#5431 that landed earlier warp sync configuration is more straightforward, but there are still numerous things interleaved that will take some time to abstract away nicely and expose in network config for developers. For now this is an internal change even though data structures are technically public and require major version bump.
# Description Follow-up to #5469 and mostly covering #5333. The primary change here is that syncing strategy is no longer created inside of syncing engine, instead syncing strategy is an argument of syncing engine, more specifically it is an argument to `build_network` that most downstream users will use. This also extracts addition of request-response protocols outside of network construction, making sure they are physically not present when they don't need to be (imagine syncing strategy that uses none of Substrate's protocols in its implementation for example). This technically allows to completely replace syncing strategy with whatever strategy chain might need. There will be at least one follow-up PR that will simplify `SyncingStrategy` trait and other public interfaces to remove mentions of block/state/warp sync requests, replacing them with generic APIs, such that strategies where warp sync is not applicable don't have to provide dummy method implementations, etc. ## Integration Downstream projects will have to write a bit of boilerplate calling `build_polkadot_syncing_strategy` function to create previously default syncing strategy. ## Review Notes Please review PR through individual commits rather than the final diff, it will be easier that way. The changes are mostly just moving code around one step at a time. # Checklist * [x] My PR includes a detailed description as outlined in the "Description" and its two subsections above. * [x] My PR follows the [labeling requirements]( https://github.com/paritytech/polkadot-sdk/blob/master/docs/contributor/CONTRIBUTING.md#Process ) of this project (at minimum one label for `T` required) * External contributors: ask maintainers to put the right label on your PR. * [x] I have made corresponding changes to the documentation (if applicable)
This feature is helpful for us with custom sync protocol that is similar to Warp sync except we do not ever sync the gap and don't want it to exist in the first place (see paritytech#5333 and its references for motivation). Otherwise we had to resort to this: d537512 --------- Co-authored-by: Davide Galassi <davxy@datawok.net> (cherry picked from commit 030cb4a)
This is a step towards paritytech#5333 The commits with code moving (but no other changes) and actual changes are separated for easier review. Essentially this results in `SyncingStrategy` trait replacing struct (which is renamed to `PolkadotSyncingStrategy`, open for better name suggestions). Technically it is already possible to replace `PolkadotSyncingStrategy<B, Client>` with `Box<dyn SyncingStrategy<B>` in syncing engine, but I decided to postpone such change until we actually have an ability to customize it. It should also be possible to swap `PolkadotSyncingStrategy` with just `ChainSync` that only supports regular full sync from genesis (it also implements `SyncingStrategy` trait). While extracted trait still has a lot of non-generic stuff in it like exposed knowledge of warp sync and `StrategyKey` with hardcoded set of options, I believe this is a step in the right direction and will address those in upcoming PRs. With paritytech#5431 that landed earlier warp sync configuration is more straightforward, but there are still numerous things interleaved that will take some time to abstract away nicely and expose in network config for developers. For now this is an internal change even though data structures are technically public and require major version bump. (cherry picked from commit 1f1f20a)
# Description Follow-up to paritytech#5469 and mostly covering paritytech#5333. The primary change here is that syncing strategy is no longer created inside of syncing engine, instead syncing strategy is an argument of syncing engine, more specifically it is an argument to `build_network` that most downstream users will use. This also extracts addition of request-response protocols outside of network construction, making sure they are physically not present when they don't need to be (imagine syncing strategy that uses none of Substrate's protocols in its implementation for example). This technically allows to completely replace syncing strategy with whatever strategy chain might need. There will be at least one follow-up PR that will simplify `SyncingStrategy` trait and other public interfaces to remove mentions of block/state/warp sync requests, replacing them with generic APIs, such that strategies where warp sync is not applicable don't have to provide dummy method implementations, etc. ## Integration Downstream projects will have to write a bit of boilerplate calling `build_polkadot_syncing_strategy` function to create previously default syncing strategy. ## Review Notes Please review PR through individual commits rather than the final diff, it will be easier that way. The changes are mostly just moving code around one step at a time. # Checklist * [x] My PR includes a detailed description as outlined in the "Description" and its two subsections above. * [x] My PR follows the [labeling requirements]( https://github.com/paritytech/polkadot-sdk/blob/master/docs/contributor/CONTRIBUTING.md#Process ) of this project (at minimum one label for `T` required) * External contributors: ask maintainers to put the right label on your PR. * [x] I have made corresponding changes to the documentation (if applicable) (cherry picked from commit 43cd6fd)
# Description This is a continuation of #5666 that finally fixes #5333. This should allow developers to create custom syncing strategies or even the whole syncing engine if they so desire. It also moved syncing engine creation and addition of corresponding protocol outside `build_network_advanced` method, which is something Bastian expressed as desired in #5 (comment) Here I replaced strategy-specific types and methods in `SyncingStrategy` trait with generic ones. Specifically `SyncingAction` is now used by all strategies instead of strategy-specific types with conversions. `StrategyKey` was an enum with a fixed set of options and now replaced with an opaque type that strategies create privately and send to upper layers as an opaque type. Requests and responses are now handled in a generic way regardless of the strategy, which reduced and simplified strategy API. `PolkadotSyncingStrategy` now lives in its dedicated module (had to edit .gitignore for this) like other strategies. `build_network_advanced` takes generic `SyncingService` as an argument alongside with a few other low-level types (that can probably be extracted in the future as well) without any notion of specifics of the way syncing is actually done. All the protocol and tasks are created outside and not a part of the network anymore. It still adds a bunch of protocols like for light client and some others that should eventually be restructured making `build_network_advanced` just building generic network and not application-specific protocols handling. ## Integration Just like #5666 introduced `build_polkadot_syncing_strategy`, this PR introduces `build_default_block_downloader`, but for convenience and to avoid typical boilerplate a simpler high-level function `build_default_syncing_engine` is added that will take care of creating typical block downloader, syncing strategy and syncing engine, which is what most users will be using going forward. `build_network` towards the end of the PR was renamed to `build_network_advanced` and `build_network`'s API was reverted to pre-#5666, so most users will not see much of a difference during upgrade unless they opt-in to use new API. ## Review Notes For `StrategyKey` I was thinking about using something like private type and then storing `TypeId` inside instead of a static string in it, let me know if that would preferred. The biggest change happened to requests that different strategies make and how their responses are handled. The most annoying thing here is that block response decoding, in contrast to all other responses, is dependent on request. This meant request had to be sent throughout the system. While originally `Response` was `Vec<u8>`, I didn't want to re-encode/decode request and response just to fit into that API, so I ended up with `Box<dyn Any + Send>`. This allows responses to be truly generic and each strategy will know how to downcast it back to the concrete type when handling the response. Import queue refactoring was needed to move `SyncingEngine` construction out of `build_network` that awkwardly implemented for `SyncingService`, but due to `&mut self` wasn't usable on `Arc<SyncingService>` for no good reason. `Arc<SyncingService>` itself is of course useless, but refactoring to replace it with just `SyncingService` was unfortunately rejected in #5454 As usual I recommend to review this PR as a series of commits instead of as the final diff, it'll make more sense that way. # Checklist * [x] My PR includes a detailed description as outlined in the "Description" and its two subsections above. * [x] My PR follows the [labeling requirements]( https://github.com/paritytech/polkadot-sdk/blob/master/docs/contributor/CONTRIBUTING.md#Process ) of this project (at minimum one label for `T` required) * External contributors: ask maintainers to put the right label on your PR. * [x] I have made corresponding changes to the documentation (if applicable)
Is there an existing issue?
Experiencing problems? Have you tried our Stack Exchange first?
Motivation
Right now
SyncingStrategy
is created in the constructor ofSyncingEngine
and the whole thing assumes there are just 3 syncing strategies that transition like this:polkadot-sdk/substrate/client/network/sync/src/strategy.rs
Lines 520 to 522 in 6619277
This is a problem for chains that both don't have/use Warp sync and/or have custom sync strategies (like Subspace/Autonomys does, we have two custom sync strategies already).
Assumption of existence of certain strategies makes it impossible to implement custom ones without forking Substrate and injecting custom logic in strategic places, which is hard to maintain and fragile long-term. It also frequently results in unexpected side-effects like #4607, #4407 and more.
Specific examples of strategies we have already implemented with hacks in Subspace/Autonomys:
Request
Make it possible to define custom sync strategies and transition sequence without forking
Solution
Probably the first step would be to extract a trait out of existing
SyncingStrategy
, then probably extract different strategies out ofSyncingStrategy
and make them composable, finally expose a way to configure custom syncing strategy instead of what is used by default.Not sure what to do with
build_network
, it contains significant amount of boilerplate and while could be copy-pasted and modified in chain-specific codebase, it feels like there should be a better solution.Open to suggestions on how to approach this so we can make progress and reduce the diff with upstream in our fork.
Examples of hacks that we had to introduce downstream to implement custom sync protocols:
ChainSync
requesting the same block multiple times #1915: autonomys@4670a2cThese are all fragile workarounds that we'd really like to get rid of sooner rather than later.
cc @shamil-gadelshin
Are you willing to help with this request?
Yes!
The text was updated successfully, but these errors were encountered: