Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce migrations to be standalone scripts #296

Open
kianenigma opened this issue Nov 18, 2021 · 11 comments
Open

Enforce migrations to be standalone scripts #296

kianenigma opened this issue Nov 18, 2021 · 11 comments
Labels
I6-meta A specific issue for grouping tasks or bugs of a specific category. T1-FRAME This PR/Issue is related to core FRAME, the framework.

Comments

@kianenigma
Copy link
Contributor

kianenigma commented Nov 18, 2021

Follow up on the discussion here: paritytech/substrate#10233 (comment). Currently, we keep some migration codes in the code, and even those that we do are pretty much useless. Given this our support for historical migrations are basically near-nothing.

I propose we enforce all migrations to be self-containing henceforth. This means that they should not depend on anything from the pallet.

If they need a struct, they must duplicate that version of it in their own local scope. If they need a particular storage item, they can mimic it using generate_storage_alias.

Next to this, we are moving toward banning on_runtime_upgrade inside the pallet (paritytech/substrate#8713). This is aligned with this issue as well. To begin with, migrations should be controlled explicitly by the top-level runtime.

Once we have all of this, I belive we can make migrations fully automated. Each migration will have a u32 id. Each pallet will have an on-chain u32 identifier, akin to the current storage version. Upon each runtime upgrade, if any migrations have been added to the repository will be executed. These migrations need to be added to some static registry.

Regardless of realizing the last statement or not, in order to for the migrations codes that we keep in substrate to have any utility, we better start by making them independent of the pallet. Some examples of this already exist (pallet-elections-phragmen).

@KiChjang
Copy link
Contributor

Would this make migrations look more or less akin to database migrations for a server backend? I think that should ideally be what we're aiming for here.

@kianenigma
Copy link
Contributor Author

Would this make migrations look more or less akin to database migrations for a server backend? I think that should ideally be what we're aiming for here.

totally.

@ruseinov
Copy link
Contributor

I really like the idea of migrations not depending on the pallet code, however I'd like to chip in with my 2 cents on the subject, before we go down this particular rabbit hole.

First of all, this will increase the bloat, because we'll need to redefine the structs used for the storage. In case of one or two structs that's kind of okay, when this becomes more complex with some nesting levels - different game.

Secondly, it's going to become harder for newcomers to write migrations efficiently, especially when we start talking about more complex scenarios, which I'm going to provide as examples below.

Last, but not least, it's kind of nice to have compile-time guarantees when writing migrations with more or less complicated structs involved, as when we are reusing the storage structs from pallets - we know for sure the struct resembles the storage. It's not that big of a deal as each migration needs to be tested anyway, but it still adds to the overall migration testing/implementation burden.

Some examples of migrations that no longer work due to the fact that the code has changed:

  1. Staking, with some comments within the code here
pub mod v8 {
	use super::*;
	use crate::{Config, Nominators, Pallet, Weight};
	use frame_election_provider_support::SortedListProvider;
	use frame_support::traits::Get;

	#[cfg(feature = "try-runtime")]
	pub fn pre_migrate<T: Config>() -> Result<(), &'static str> {
		frame_support::ensure!(
			StorageVersion::<T>::get() == ObsoleteReleases::V7_0_0,
			"must upgrade linearly"
		);

		crate::log!(info, "👜 staking bags-list migration passes PRE migrate checks ✅",);
		Ok(())
	}

	/// Migration to sorted `VoterList`.
	pub fn migrate<T: Config>() -> Weight {
		if StorageVersion::<T>::get() == ObsoleteReleases::V7_0_0 {
			crate::log!(info, "migrating staking to ObsoleteReleases::V8_0_0");
                        // We will no longer have the Bags-List pallet instance here, so we'll have to reimplement this functionality
                       // which relies in several method and storage structs that are called under the hood.
			let migrated = T::VoterList::unsafe_regenerate(
                                 // replicate the Nominators struct
				Nominators::<T>::iter().map(|(id, _)| id),
				Pallet::<T>::weight_of_fn(),
			);
                         // still need to call that here on in post_migrate
			debug_assert_eq!(T::VoterList::try_state(), Ok(()));

			StorageVersion::<T>::put(ObsoleteReleases::V8_0_0);
			crate::log!(
				info,
				"👜 completed staking migration to ObsoleteReleases::V8_0_0 with {} voters migrated",
				migrated,
			);

			T::BlockWeights::get().max_block
		} else {
			T::DbWeight::get().reads(1)
		}
	}

	#[cfg(feature = "try-runtime")]
	pub fn post_migrate<T: Config>() -> Result<(), &'static str> {
		T::VoterList::try_state().map_err(|_| "VoterList is not in a sane state.")?;
		crate::log!(info, "👜 staking bags-list migration passes POST migrate checks ✅",);
		Ok(())
	}
}
  1. Staking
pub mod v9 {
	use super::*;
	#[cfg(feature = "try-runtime")]
	use frame_support::codec::{Decode, Encode};
	#[cfg(feature = "try-runtime")]
	use sp_std::vec::Vec;

	/// Migration implementation that injects all validators into sorted list.
	///
	/// This is only useful for chains that started their `VoterList` just based on nominators.
	pub struct InjectValidatorsIntoVoterList<T>(sp_std::marker::PhantomData<T>);
	impl<T: Config> OnRuntimeUpgrade for InjectValidatorsIntoVoterList<T> {
		fn on_runtime_upgrade() -> Weight {
			if StorageVersion::<T>::get() == ObsoleteReleases::V8_0_0 {
                                 // need to replicate bags list storage and some of the SortedListProvider methods
				let prev_count = T::VoterList::count();
				let weight_of_cached = Pallet::<T>::weight_of_fn();
                                 // replicating validators struct
				for (v, _) in Validators::<T>::iter() {
					let weight = weight_of_cached(&v);
                                         // a bunch more VoterList action that needs to be taken into account
                                         
					let _ = T::VoterList::on_insert(v.clone(), weight).map_err(|err| {
						log!(warn, "failed to insert {:?} into VoterList: {:?}", v, err)
					});
				}

				log!(
					info,
					"injected a total of {} new voters, prev count: {} next count: {}, updating to version 9",
					Validators::<T>::count(),
					prev_count,
					T::VoterList::count(),
				);

				StorageVersion::<T>::put(ObsoleteReleases::V9_0_0);
				T::BlockWeights::get().max_block
			} else {
				log!(
					warn,
					"InjectValidatorsIntoVoterList being executed on the wrong storage \
				version, expected ObsoleteReleases::V8_0_0"
				);
				T::DbWeight::get().reads(1)
			}
		}

		#[cfg(feature = "try-runtime")]
		fn pre_upgrade() -> Result<Vec<u8>, &'static str> {
			frame_support::ensure!(
				StorageVersion::<T>::get() == ObsoleteReleases::V8_0_0,
				"must upgrade linearly"
			);

			let prev_count = T::VoterList::count();
			Ok(prev_count.encode())
		}

		#[cfg(feature = "try-runtime")]
		fn post_upgrade(prev_count: Vec<u8>) -> Result<(), &'static str> {
                        // try_state call
			let prev_count: u32 = Decode::decode(&mut prev_count.as_slice()).expect(
				"the state parameter should be something that was generated by pre_upgrade",
			);
			let post_count = T::VoterList::count();
			let validators = Validators::<T>::count();
			assert!(post_count == prev_count + validators);

			frame_support::ensure!(
				StorageVersion::<T>::get() == ObsoleteReleases::V9_0_0,
				"must upgrade "
			);
			Ok(())
		}
	}
}

In case of replicating, say, BagsList pallet storage and methods we'd have to replicate several pretty intricate methods along with their corresponding structs:

  1. https://github.com/paritytech/substrate/blob/dc22e48e1da70b84fb7b8a1ffc2076cf1b6e1d99/frame/bags-list/src/list/mod.rs#L398
  2. https://github.com/paritytech/substrate/blob/d829b919f16f9ec9d9a15ae190bd37a5c37ec8ed/frame/bags-list/src/lib.rs#L283

And this is just the tip of the iceberg.

We could also take a look at a simple migration, like that one, which is already quite bloated and it's going to be double the bloat.

Long story short I'm not sure that the added complexity and time spent implementing the migrations is worth being pallet-agnostic.

Why? Breaking an older migration code is a minor concern, it can be commented out and/or provided as code example for posterity.

Running prior migrations is not a problem either, just using release branches / webassembly code from older versions. It's a bit of a pain, but I don't see this happening often either.

Happy to be proven wrong.

@bkchr
Copy link
Member

bkchr commented Jan 10, 2023

Running prior migrations is not a problem either, just using release branches / webassembly code from older versions. It's a bit of a pain, but I don't see this happening often either.

You are forgetting that Polkadot isn't the only user of Substrate. For staking this currently maybe being true, but most of the other pallets are being used by a lot of Parachains etc. Your complaints are nevertheless correct, but we should work on this. We need to identify common patterns and then provide abstractions/macros/tools to help with that.

@ggwpez
Copy link
Member

ggwpez commented Jan 10, 2023

paritytech/substrate#13107 tries to abstract the redundant version tracking and logging.

In case of replicating, say, BagsList pallet storage and methods we'd have to replicate several pretty intricate methods along with their corresponding structs:

Uhf… yea that is annoying. Having mutation functions on stored structs makes this much more complicated. I dont see how we could circumvent that besides not using that pattern in the first place. Which is sub-optimal, since its in itself a good pattern.

@ruseinov
Copy link
Contributor

Running prior migrations is not a problem either, just using release branches / webassembly code from older versions. It's a bit of a pain, but I don't see this happening often either.

You are forgetting that Polkadot isn't the only user of Substrate. For staking this currently maybe being true, but most of the other pallets are being used by a lot of Parachains etc. Your complaints are nevertheless correct, but we should work on this. We need to identify common patterns and then provide abstractions/macros/tools to help with that.

For now if anybody wants to migrate from a very old storage - they could use old release branches to gradually run whatever migrations they need, git blame will help them to find the right release. I know it's not ideal, but it's definitely doable.
At the moment I can't see how macros or other things could help us deal with, for example, BagsList case.

@ruseinov
Copy link
Contributor

paritytech/substrate#13107 tries to abstract the redundant version tracking and logging.

In case of replicating, say, BagsList pallet storage and methods we'd have to replicate several pretty intricate methods along with their corresponding structs:

Uhf… yea that is annoying. Having mutation functions on stored structs makes this much more complicated. I dont see how we could circumvent that besides not using that pattern in the first place. Which is sub-optimal, since its in itself a good pattern.

If I'm being honest I don't see why the actual library code should suffer for migration convenience. If anything it should be the other way around. Migrations are simply one-off helpers that serve the purpose, once.

It's a bit annoying having to potentially use several different release branches to gradually upgrade, but it can be done and it's not that hard. Maybe we should provide some sort of convenience tooling for that instead?

@ruseinov
Copy link
Contributor

Now that I re-read the original issue it's a bit more clear to me what the purpose of this is: migration automation.

Still, I can't see how this could be easily achieved as decoupling migrations from pallets is tricky enough, but we also still need to specify the correct storage version of each pallet should anyone start from the current codebase. And if this is wrong and somehow slips into master - that'll be potentially a huge pain as things would fail while trying to roll those migrations out automatically.

It seems like manual control over migrations similar to what we have now, having to add them to individual runtimes, might be a better option. It's a bit more work, but it gives more fine-grained control of what's happening, since we need to compile a new runtime anyway in order to introduce any new code.

I've had similar experiences with SQL migrations on various projects. If the migration is complex enough - it might be a multi-step one, that needs to be run and monitored manually. Even if it's not that complex - it's almost always a dangerous pattern to let it be run automatically or via CI. In our case the automation would come from simply compiling a new runtime with new migrations in place.
One other concern is: what if we introduce 10 migrations to 10 different pallets, if all of those run back-to-back - that could potentially be a concern.

All of the above concerns might be good to consider in order to see how we could mitigate some if not all of them while still providing a reasonable level of automation.

@kianenigma
Copy link
Contributor Author

kianenigma commented Jan 15, 2023

I think we should not settle this yet and as Basti said try and find patters that are common and help migration scripts live longer. One of them is this:

https://github.com/paritytech/substrate/blob/0246883c404d498090f33e795feb8075fa8d3b6b/frame/elections-phragmen/src/migrations/v3.rs#L41-L48

Which means:

  1. don't rely on the definition of storage structs from your pallet
  2. don't rely on any type coming in from trait Config of your pallet

Using, your PR which sparked this discussion can be solved fairly easily paritytech/substrate@2c6275e as the only breaking change is the introduction of ReadOnlySortedListProvider.

That being said, this is still not perfect because i.e. if the definition of SortedListProvider changes and one of the functions used in the migrations are i.e. removed, then we would again be trouble. That being said, again, that is also solvable.

TLDR; I don't have a plan here, but I am not at a spot where I can also say screw the old migrations and let's move on. If Substrate wants to be ultimate, user-friendly framework that it wants to be, it must solve this issue, and I am down to put more effort into it.


@ruseinov as your comments in #296, these are more about automation which is the end goal, but not the immediate issue. We first have to make migration scripts reusable, then think about automation, so I will not comment further on that for the time being.

It's a bit annoying having to potentially use several different release branches to gradually upgrade, but it can be done and it's not that hard. Maybe we should provide some sort of convenience tooling for that instead?

That's not a bad idea nonetheless. But again, if you look at the fixes needed in paritytech/substrate@2c6275e, it is actually not that much overhead in most cases to keep the migrations sane.


The idea that I am now eager to implement and show to all of you as PoC is this: I will migrate all of the migrations of the nomination pools pallet to be 100% independent of the runtime in some PR, and we can then see what the outcome looks like. If someone else has the time, please go ahead.

@athei
Copy link
Member

athei commented Jan 17, 2023

Let us take a step back and think what would be the most convenient and easiest way to deliver migrations for our users:

I think it is the old way of bundling them with the pallet. The pallet knows its current version and the version in storage. It runs all necessary migrations. The runtime just calls into all pallets for migrations on upgrade. Everything just works.

But there are reasons why we are no longer doing it this way. Let me see if I got all of them:

  • Migrations consume space in the runtime's code. We don't want them in the pallets and hence in the productive runtime.
  • Migrations are more opaque to runtime authors when handled by the pallets. This is nice but this might lead to a situation where they compound to something that is heavier than a single block.

I think we kind should find ways to address both issues instead of dumping this hugely complicated topic onto our users.

@ruseinov
Copy link
Contributor

The idea that I am now eager to implement and show to all of you as PoC is this: I will migrate all of the migrations of the nomination pools pallet to be 100% independent of the runtime in some PR, and we can then see what the outcome looks like. If someone else has the time, please go ahead.

This is not documented in the issue, but I think @kianenigma is not that big of a fan of this idea anymore, as discussed during FRAME steering meet. Or was it Staking meet? I don't remember. Anyhow, since we don't have a unified way/language (SQL-like) to deal with this and a lot of the migrations really need to mimic the business-logic of pallets whose storage they are touching - it seems like the best approach is to do what we've done before.

  • Migrations consume space in the runtime's code. We don't want them in the pallets and hence in the productive runtime.

It seems like a good idea is to clean up old migrations. And let the users deal with this by gradually upgrading, if they need to. But yeah, the UX of this is not amazing.

@juangirini juangirini transferred this issue from paritytech/substrate Aug 24, 2023
@the-right-joyce the-right-joyce added I6-meta A specific issue for grouping tasks or bugs of a specific category. T1-FRAME This PR/Issue is related to core FRAME, the framework. and removed J1-meta labels Aug 25, 2023
claravanstaden pushed a commit to Snowfork/polkadot-sdk that referenced this issue Dec 8, 2023
helin6 pushed a commit to boolnetwork/polkadot-sdk that referenced this issue Feb 5, 2024
* Fix receipt's log transaction_hash

* Block's total_difficulty default to 0

* Update ts-test
jonathanudd pushed a commit to jonathanudd/polkadot-sdk that referenced this issue Apr 10, 2024
* Add pallet template from Substrate Dev Hub

* Clean up un-needed stuff from template

* Sketch out dispatchable interface

* Introduce notion of finality chain

* Add dependencies which were removed during a rebase

* Sketch out idea for finality header-chain pallet

* Sketch out ChainVerifier trait

* Add storage parameter to verifier

* Write out some things I think I need for finality verification

* Add some pseudocode for marking finalized headers

* Remove parity_scale_codec duplicate

* Move verification logic into pallet

I've been struggling with getting the generic types between the storage and verifier
traits to play nice with each other. As a way to continue making progress I'm moving
everything to the pallet. This way I hope to make progress towards a functional
pallet.

* Start doing verification around authority set changes

* Remove commented BridgeStorage and ChainVerifier traits

* Create Substrate bridge primitives crate

* Add logic for updating scheduled authority sets

* Introduce notion of imported headers

* Implement basic header ancestry checker

* Add mock runtime for tests

* Add testing boilerplate

* Add some storage read/write sanity tests

* Add some basic header import tests

* Add tests for ancestry proofs

* Create helper for changing authority sets

* Fix authority set test

Fixes a problem with how the scheduled change was counted as well as
a SCALE encoding issue

* Correctly check for scheduled change digests

There's no guarantee that the consensus digest item will be the last
one in a header, which is how it was previously being checked.

Thanks to Andre for pointing me to the Grandpa code that does this.

* Mark imported headers as finalized when appropriate

When a header that finalizes a chain of headers is succesfully imported
we also want to mark its ancestors as finalized.

* Add helper for writing test headers

* Add test helper for scheduling authority set changes

* Bump Substrate pallet and primitives to rc6

* Remove Millau verifier implementation

* Add some doc comments

* Remove some needless returns

* Make Clippy happy

* Split block import from finalization

* Make tests compile again

* Add test for finalizing header after importing children

* Create a test stub for importing future justifications

* Start adding genesis config

* Reject justifications from future

We should only be accepting justifications for the header
which enacted the current authority set. Any ancestors of
that header which require a justification can be imported
but they must not be finalized.

* Add explanation to some `expect()` calls

* Start adding GenesisConfig

* Plug genesis config into runtime

* Remove tests module

* Check for overflow when updating authority sets

* Make verifier take ownership of headers during import

* Only store best finalized header hash

Removed the need to store the whole header, since we store
it was part of the ImportedHeaders structure anyways

* Add some helpers to ImportedHeader

* Update ancestry checker to work with ImportedHeaders

* Update ancestry tests to use ImportedHeaders

* Update import tests to use ImportedHeaders

* Clean up some of the test helpers

* Remove stray dbg!

* Add doc comments throughout

* Remove runtime related code

* Fix Clippy warnings

* Remove trait bound on ImportedHeader struct

* Simplify checks in GenesisConfig

* Rename `get_header_by_hash()`

* Alias `parity_scale_codec` to `codec`

* Reword Verifier documentation

* Missed codec rename in tests

* Split ImportError into FinalizationError

* Remove ChainVerifier trait

This trait was a remenant of the original design, and it is not required
at the moment. Something like it should be added back in the future to
ensure that other chains which conform to this interface can be used
by higher-level bridge applications.

* Fix the verifier tests so they compile

* Implement Deref for ImportedHeader

* Get rid of `new` methods for some Substrate primitives

* Ensure that a child header's number follows its parent's

* Prevent ancestry checker from aimlessly traversing to genesis

If an ancestor which was newer than the child header we were checking we
would walk all the way to genesis before realizing that we weren't related.
This commit fixes that.

* Remove redundant clones

* Ensure that old headers are not finalized

Prevents a panic where if the header being imported and `best_finalized`
were the same header the ancestry checker would return an empty list. We
had made an assumption that the list would always be populated, and if this
didn't hold we would end up panicking.

* Disallow imports at same height as `best_finalized`

* Fix Clippy warnings

* Make NextScheduledChange optional

* Rework how scheduled authority set changes are enacted

We now require a justification for headers which _enact_ changes
instead of those which _schedule_ changes. A few changes had to
be made to accomodate this, such as changing when we check for
scheduled change logs in incoming headers.

* Update documentation for Substrate Primitives

* Clarify why we skip header in requires_justification check

* Add description to assert! call

* Fix formatting within macros

* Remove unused dependencies from runtime

* Remove expect call in GenesisConfig

* Turn FinalityProof into a struct

* Add some inline TODOs for follow up PRs

* Remove test which enacted multiple changes

This should be added back at some later point in time, but right now
the code doesn't allow for this behaviour.

* Use `contains_key` when checking for header

This is better than using `get().is_some()` since we skip
decoding the storage value

* Use initial hash when updating best_finalized

* Add better checks around enacting scheduled changes

* Rename finality related functions

* Appease Clippy
github-merge-queue bot pushed a commit that referenced this issue Dec 4, 2024
## [0.8.3] - 2024-12-03

This release includes two fixes for small memory leaks on edge-cases in
the notification and request-response protocols.

### Fixed

- req-resp: Fix memory leak of pending substreams
([#297](paritytech/litep2p#297))
- notification: Fix memory leak of pending substreams
([#296](paritytech/litep2p#296))

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
lexnv added a commit that referenced this issue Dec 4, 2024
## [0.8.3] - 2024-12-03

This release includes two fixes for small memory leaks on edge-cases in
the notification and request-response protocols.

### Fixed

- req-resp: Fix memory leak of pending substreams
([#297](paritytech/litep2p#297))
- notification: Fix memory leak of pending substreams
([#296](paritytech/litep2p#296))

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Krayt78 pushed a commit to Krayt78/polkadot-sdk that referenced this issue Dec 18, 2024
## [0.8.3] - 2024-12-03

This release includes two fixes for small memory leaks on edge-cases in
the notification and request-response protocols.

### Fixed

- req-resp: Fix memory leak of pending substreams
([paritytech#297](paritytech/litep2p#297))
- notification: Fix memory leak of pending substreams
([paritytech#296](paritytech/litep2p#296))

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
dudo50 pushed a commit to paraspell-research/polkadot-sdk that referenced this issue Jan 4, 2025
## [0.8.3] - 2024-12-03

This release includes two fixes for small memory leaks on edge-cases in
the notification and request-response protocols.

### Fixed

- req-resp: Fix memory leak of pending substreams
([paritytech#297](paritytech/litep2p#297))
- notification: Fix memory leak of pending substreams
([paritytech#296](paritytech/litep2p#296))

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I6-meta A specific issue for grouping tasks or bugs of a specific category. T1-FRAME This PR/Issue is related to core FRAME, the framework.
Projects
Status: Backlog
Development

No branches or pull requests

8 participants