Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add op-deployer design #71

Merged
merged 3 commits into from
Sep 16, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
163 changes: 163 additions & 0 deletions ecosystem/op-deployer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
# Purpose

Introduce `op-deployer`, an end-to-end chain deployment tool, and get buy-in on its architecture and implementation.

## Goals

- Provide a canonical way to deploy chains.
- Enable deploying multiple L2s to a single L1 as part of a single logical Superchain.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about "Deploying multiple superchains to a single L1" as another goal? For example the devnet and testnet both have superchains on L1 Sepolia, so this should help with spinning up devnets. It also helps validate the architecture is sufficiently modular

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tooling should support that use case. I imagine you'd do that by defining another deployment intent, and running that through the tooling. Is there a use case for defining multiple superchains and multiple L2s in the same deployment intent?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on what exactly a deployment intent is and what execution of it does. Can it be scoped to a single step of the pipeline? What part of the intent file maps to a pipeline step, or is it one intents file per step?

Either way, yes a separate invocation with a different deployment intent to support this seems ok to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A deployment intent is a declarative way of describing how your deployment will look. The tooling then looks at the intent and decides what changes it needs to make on L1/l2 to make the real deployment match the intent. It's like Terraform, but for blockchains. I can clarify this more in the doc.

To your questions specifically:

Can it be scoped to a single step of the pipeline?

No, the deployer would decide which steps it needs to run based on the changes it sees to the deployment intent.

What part of the intent file maps to a pipeline step, or is it one intents file per step?

There is one intent per file, defined in the [intent] stanza.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will op-deployer support incremental deployments (i.e., only deploy the minimal set of components that were modified as a result of an intent mutation)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, it will.

- Make it easier for teams like Protocol and Proofs to maintain their deployment tooling.
Copy link

@jelias2 jelias2 Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does op-deployer aim to deploy production level environments or just development tooling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially development tooling, but over time I'd like it to be used for prod environments as well.


## Non-Goals

- Replacing `op-e2e`. This will be discussed later, once `op-deployer` is live.

# Problem Statement + Context

As we've
discussed [previously](https://docs.google.com/document/d/13f8UoO9j05PJdvAZDWuVJWD1UAIEAfb8VbphBnPE3cA/edit#heading=h.jxalqw91xhd0),
deploying and upgrading OP Chains remains a complicated and frustrating process. Users are directed to a tutorial
with [18 manual steps](https://docs.optimism.io/builders/chain-operators/tutorials/create-l2-rollup) to follow in order
to create their own testnet. Within Labs, we use a combination of different tools depending on where the chain is to be
hosted:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the superchain-ops-team uses some variant of the deploy-scripts, with some instrumentation for Safe contracts / deploy-tx reliability. I'm not entirely familiar with it, but I think it should be in this list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll track this down.


- **opc** deploys new chains to our cloud infrastructure.
- The **compose devnet** uses a bunch of custom Python tooling to deploy a chain locally using `docker-compose`.
- The **Kurtosis devnet** uses a combination of custom deployer scripts and the fault proof's `getting-started.sh` to
Comment on lines +24 to +26
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add links to these tools for people (like me) that aren't familiar with all of them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Will update, though opc is in a private repo of ours so it won't be very useful for non-Labs folks.

deploy a chain to a new enclave.
- The Interop team now has [their own](https://github.com/ethereum-optimism/optimism/pull/11590) devnet setup to deploy
a set of interoperable chains locally.

Notably, with the exception of the Interop devnet none of the solutions above enable deploying multiple L2s to a single
L1 as part of a single logical Superchain. This is a huge gap in functionality, that will only become more magnified as
time goes on.

To resolve this problem we need set of well-integrated, modular tools that define the canonical way of deploying a
chain. Users can then leverage these tools to build their own deployment tooling on top.

Much of what's described below is already implemented as part of `opc`. By extracting these concepts out of `opc`
and into the monorepo, we can make them more accessible to other teams and users.

# Proposed Solution

The high-level architecture is described
in [Modular Deployments](https://docs.google.com/document/d/13f8UoO9j05PJdvAZDWuVJWD1UAIEAfb8VbphBnPE3cA/edit#heading=h.jxalqw91xhd0),
so read that document first if you haven't already. This design doc will focus on the implementation details of the
architecture it describes.

## Deployment Pipeline

The core of the modular deployment tooling is the concept of a _deployment pipeline_. The pipeline consists of stages
that each perform a single piece of the deployment. The stages each take an input file, and enrich it with the data
pertaining to that particular stage of the deployment. This allows the output of upstream stages to be cached.
Comment on lines +51 to +52
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that after deploying a L2 to a L1, by updating the L1 genesis.json, we end up modifying the L1 genesis block-hash, which affects the L2 genesis of any prior deployed L2. So we have to interleave L2 deployments by first doing all the L1 parts, then "freeze" the L1 genesis, and then all the L2 genesis parts.

So I think we need the concept of a "finalized" config, and a config that can still mutate in later pipeline steps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean that the L1 genesis.json changes? Are you referring to the case where you try to deploy a bunch of chains in the L1's genesis.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we want to pre-deploy multiple L2s to the same L1, merging them into a unified genesis state, then that needs to wrap up before we can create any of the L2 rollup configs, due to the dependency on the L1 genesis block hash. So it's important to separate the "L2 deploy" into a "deploy L2 contracts to L1" and a "create L2 genesis" step, where the genesis steps for each L2 can run after the L1 state is final.


The diagram below outlines what these stages are:

![](./op-deployer/pipeline.png)

## Deployment Intents
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions:

  1. Will a deployment intent have a standard interface of some kind?
  2. Will a deployment intent be executable from op-e2e?
  3. Will a deployment intent be coupled to a strict pre-state? In particular, is an intent to deploy X to a local L1 genesis (aka statedump) different from an intent to produce a list of template deployment txs (broadcast output)?
  4. Will every intent run with the forge tooling? Or can there be intents that are not forge?
  5. How does an "interrupted deployment" get recovered? One of the pain-points product hit is that they run forge, then have to CTRL+C because of gas spike issues, and then have to start over from the beginning.
  6. Will we integrate the source-code verify step? (API calls to etherscan/blockscout). Ideally I think the deployment-tools produce a dump, such that the verification can happen later, in case a new explorer comes online.
  7. Does this tool include L2Genesis?
  8. Will intents have defaults? Maybe defaults sourced from the superchain-registry?
  9. How does a pipeline execute, when it involves intermittent pauses for e.g. deployment-tx signing? Does it persist, and continue from where it left of, after a signing phase maybe?
  10. Can we a add a --noninteractive kind of flag, that auto-signs with some hotkey, for testing purposes?

Personally I'd also suggest running op-deployer intents fully in Go, or at least having some flexibility to do so.
Otherwise each intent will rely on sub-process calls to forge, intermediate config files and output files, and many extra consistency checks.
With forge-scripts in Go, it can be packaged up, and do exactly as the intent is programmed to, without interacting with anything outside of the process, other than an artifacts tarball. I think this makes it much more reproducible in different environments.
The drawback however is that we don't use standard forge. Maybe with the right abstraction it's interchangeable, and forge is version-checked, inputs are all in unique temporary directories, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answers below:

  1. Yes - my strawman interface is the one in the [intent] stanza in the design doc.
  2. Yes - since each deployment stage is a Go function, it'll be easy to call from op-e2e.
  3. The idea is that it won't matter where you deploy the chain to. In practice we'll have to implement different "backends" in the deployment tool to support both statedump and deployed L1s.
  4. No, intents can run without Forge. I'd actually like to move away from shelling out to Forge whenever possible.
  5. Stages commit to the intents TOML to store their state periodically. This allows for the pipeline to recover halfway through.
  6. Not right now, but we can add tooling to do this in the future or produce a dump like you said.
  7. I'd like it to leverage the Go forge-scripts tooling you built to obviate the need for the L2Genesis Forge script.
  8. Yes, intents will have defaults. The default config is the standard config. This can be defined in the registry.
  9. It will persist its state inside the intents TOML to allow for this kind of behavior.
  10. Sure!

In general I agree that we want to run the intents fully in Go. Some stuff requires shelling out to Forge right now, but in the future I'd like to refactor everything we can away from that.


We'll create a new tool called `op-deployer` to orchestrate deployments. Users won't need to interact with the pipeline
directly. Instead, they'll define a _deployment intent_ which will describe the properties of the chains they want to
deploy. `op-deployer` will then diff the intent against the current state of the deployment, and run the pipeline to
ensure that the real deployment matches the intent.
Comment on lines +62 to +63
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff the intent against the current state of the deployment

This part is throwing me off a bit, when in the pipeline is a diff occurring and what exactly are we diffing against?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's diffing its internal state against what's actually deployed. For example, the L1 deployer step would check to see if the contracts are actually deployed to L1 before executing. If they aren't, it'll perform the deployment.


An example deployment intent is below. Deployment intents are encoded as TOML to match what the OP Stack Manager
expects:

```toml
[intent]
Comment on lines +65 to +69
Copy link

@jelias2 jelias2 Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within the intent struct, should the tooling aim to have overlap with the superchain registry config structure or be able to ingest that repo? Aligning formats between these two tooling could enhance usability and impact by allowing any operator to easily deploy another chain

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deployer can output a deploy config, so it's compatible with the registry. Over time I expect us to refactor away from the deploy config, however, so at that time we'll need to refactor the SR as well.

l1ChainID = 11_155_111
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't want to get too into the weeds of input file structure here, but we may want to include RPC URL too, since sometimes chain ID 11_155_111 might be L1 sepolia, but other times it might be a local fork

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should separate the RPC URL, since that's more the environment that the deployer uses than an inherent property of the deployment itself.

useFaultProofs = true
fundDevAccounts = true
# Define the version of the smart contracts to deploy
contractsVersion = "1.2.0"
mslipper marked this conversation as resolved.
Show resolved Hide resolved
# Override specific variables in the deploy config globally
overrides = { }

# Define the chains to deploy, by chain ID
[intent.chains.901]
# Override variables in the deploy config for this specific chain
overrides = { }

[intent.chains.902]
# ... etc.
```

Under the hood, `op-deployer` will run through the deployment pipeline. The output of each intermediate stage will
be stored by default within the intent's TOML file, however the implementation of the "state store" will be
Comment on lines +87 to +88
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't make configs self-mutating. Separate inputs and outputs make caching, maintainability, etc. much easier. One of my biggest regrets about the current DeployConfig is that the deployments addresses get looped back into the config struct. Cyclic dependencies are hell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree. There will be no cyclic dependencies. The idea is that the config only gets appended to - each stanza of the state is its own immutable thing, separate from all the others. The only reason they're being put in the same file is so that we have a place to store that state offline between subsequent runs of the tool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's helpful I can have the state and the intent be separate files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the config is incremental, like a list of things where we only append to, I'm fine with it. But sparse changes to an existing config quickly become an hard to track state-machine, which adds more complexity.

pluggable in order to support use cases like `opc` which store the deployment state in a database. Example output from a
stage is below:

```toml
[intent]
# Intent data is same as above and elided

[state.chains.901]
# Ownership addresses derived from a mnemonic
proxyAdminOwner = "0xe59a881b2626f948f56f509f180c32428585629a"
finalSystemOwner = "0xb9cdf788704088a4c0191d045c151fcbe2db14a4"
baseFeeVaultRecipient = "0xbc4a9110dad00b4d9fb61743598848ddda6eeb03"
l1FeeVaultRecipient = "0x6b0c2542fa2cadced5c7f64ef6fb9ebbce7630ff"
sequencerFeeVaultRecipient = "0xb4e5b724bbc54c95d292613d956871281120ead6"

# Address of OPSM so that other addresses can be retrieved from on-chain data
opStackManagerAddress = "0x79c6c6b1844e3db7c30107f189cfb095bd2c4b5d"

# Genesis data
genesis = "base64://abcd..."
genesisHash = "0x1234..."
genesisTime = 1234567890

[state.chains.902]
# ... etc.
```

Only the minimum necessary state is stored. For example, the rollup config can be derived from deploy config
variables and therefore can be generated on-the-fly when it is needed. `op-deployer` will expose utility subcommands

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It may be nice to have a command that outputs all the configs and deploy artifacts exhaustively so the complete state can be readily examined / debugged as needed.

output any generated data, such as the rollup config, to the console.

`op-deployer` itself will be a single Go binary that can be run from the command line. Some stages - like those that
interact with Forge tooling to deploy contracts - will either shell out to other tools or orchestrate Docker
containers that run the necessary tools. This way both local development and pre-packaged use cases are supported.
Given the overall migration towards Go tooling over Forge for genesis creation and deployments, we expect that the
number of tools requiring Docker/shell integrations will decrease over time.

Each stage will be represented as a standalone Go function for easy integration with additional tooling.

## Deploy Config Cleanup

The deployment config file is a hodgepodge of almost 100 variables which all need to be set in order to deploy a chain.
Some of these variables are only used in specific contexts, but all of them need to be set in order for some of our
tools to work. Additionally, the deploy config expects to deploy a single L2, which makes it impossible to deploy
multiple chains.

To fix this, we will:

- Remove the L1 genesis variables from the deploy config. Deploying a new l1 should be a separate process from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@protolambda split up the deploy config into themes already but did it in a backwards compatible way so that there are no breaking changes to the JSON. Would it be useful to have a migration tool, like input legacy deploy config and output a new split up deploy config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can certainly do this, but IMO we should just migrate everything all at once to the new format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did, but realistically the deploy config still has tech-debt, in naming of attributes and other ways. I would center the op-deployer around OPSM configs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the L1 genesis variables from the deploy config

Are deploy configs still a thing as part of op-deployer, or are we replacing deploy config with the intent input files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They'll be a thing for the time being. In the future I'd like to modularize things so that we can move away from a single huge monolithic config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: The intents file + its state will replace the day-to-day usage of the deployment config. Some tools (like the deploy scripts) will continue to use the deploy config until we refactor to something else.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the deploy configs (or other key chain artifacts) are modified, we'll want to be careful about how that may impact / break the Superchain Registry. I recommend we asses this impact early / upfront.

deploying an L2.
- Remove legacy config variables that are no longer used, like `l1UseClique` or `deploymentWaitConfirmations`.
- Split the deploy config into multiple stanzas, so that irrelevant config variables can be ignored. For example,
the DA configuration must be specified right now for the config to validate, even when it's not used by a given chain.
- Remove booleans that are used to enable/disable features. This data belongs inside of the deployment intent, not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this is interesting - note that this will require a decent refactor to the way the smart contracts work. It sounds like we will need to modularize some functionality so that it can be composed together and have different entrypoints rather than 1 entrypoint with a config with branching logic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modularize some functionality so that it can be composed together and have different entrypoints rather than 1 entrypoint with a config with branching logic

Yes 100%. I'm happy to help with this as part of this project.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of making the config / intent modular, rather than putting feature-booleans in a huge config

the config - the config describes the chain, not the deployment process.

Some of this is already started by the OP Stack Manager work. The `op-deployer` project will complete it.

# Alternatives Considered

## Imperative deployment tools

An earlier version of this design considered building a set of imperative deployment tools that could be
orchestrated together like a Linux pipe. This was rejected in favor of the declarative approach described above.
Users can still build custom tooling by calling out to the individual stages of the pipeline, but the primary
interface should be the deployment intent since it lets us abstract away the upgrade complexity.

# Risks & Uncertainties

- `op-deployer` can be used to version the smart contracts as well. We'll need to define how we want `op-deployer`
to work with the smart contract versioning system.
- We'll need to make sure that all deployment tooling runs through `op-deployer`. This will require a migration, as
well as buy-in from several different teams, in order to avoid creating another fragmented tool. For example,
`op-deployer` would replace the `getting-started` script in the fault proofs repo, as well as much of the tooling
the Kurtosis devnet uses to deploy new chains.
Binary file added ecosystem/op-deployer/pipeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.