-
Notifications
You must be signed in to change notification settings - Fork 137
Description
Background
As a prerequisite for being able to officially support Bitcoin mainnet for taproot assets, we need to carefully think about how we approach the question of backup and recovery of tapd data, since not only assets might be at stake but also the BTC of the anchoring transaction output (you can't spend the BTC that carries assets without being able to reconstruct the full asset tree).
This issue serves as a collection/brainstorm issue around everything related to data safety, backups and recovery procedures.
Documentation
Similar to the lnd Operational Safety Guidelines document, we'll want a doc that describes the different data sources, what they are used for and how to best prevent loss thereof.
The document should (at least) describe the following key items:
- What is the relationship between asset public/private keys (e.g.
script_keys) andlnd's wallet/seed? - What data is required in order to recover both the assets and the BTC of a taproot asset output?
- What data is stored where (
tapd's database,lnd's wallet database,lnd's channel database)? - Where does
tapdstore its files and which files need to be backed up regularly? - How can the
tapddatabase be set up in a production ready manner? - Recoverability when using a public universe vs. using a private one? (See further below).
How to prevent database loss
As long as the tapd database is fully intact and the seed for the lnd wallet is known, all funds are SAFU.
So to have a replicated (or at least regularly backed up) state of the DB should be the highest priority.
We should test and then document the following ways of setting up a database cluster or streaming replication:
- Using a Postgres database cluster as a database backend: This is already possible and is the recommended way of running
tapdin a production environment. We'll want to document some setup recommendations and best practices around this though. - Add support for low-level SQLite replication support, perhaps using something like https://github.com/benbjohnson/litestream.
How to recover from full database loss
Even though keeping the tapd database intact should always be the highest priority, the reality is that users often don't realize that uninstalling and re-installing an app on platforms like Umbrel causes all data to be deleted. So because we want to ship tapd as part of Lightning Terminal, which is available on such platforms, we need to have a strategy for basic recovery of assets and BTC for the case when the full tapd database is lost.
Possible approaches:
- Keep a single file (similar to the SCB file used in
lndfor static channel information) around that is updated on every mint, send and receive and keeps track of the latest on-chain output and proof chain, as well as the universe information. The file would basically contain all the information to be able to recover the asset and BTC funds, but not the transaction history. Then mobile and other platform apps would only need to make sure to create an off-device backup of that file whenever it changes. - When using public multiverses, then the information available in
lndcould be enough to query those multiverses for the information required to recover access to asset and BTC funds. This requires thelndwallet database to be fully intact though, as some of this information is added to the wallet DB bytapdand is not recoverable through a simplelndwallet restore from seed.- Query the
lndwallet for unspent p2tr outputs that aren't BIP-0086, then look up the multiverse for assets related to those outpoints (this will only work if the asset anchoring transaction has a change output that goes back to thelndwallet, because the actual asset anchoring output will not be recognized as "belonging" to thelndwallet). This will work for asset mints and asset change outputs. - Query the
lndwallet for any specifically registered tapscript addresses, then look those outpoints up in the multiverse to recover asset proofs. This will work for assets received through taproot asset addresses (non-interactive receives). Though the tapscript addresses aren't directly derived from the seed, so if thelndwallet was recovered from seed, this won't be possible.
- Query the
- With v2 addresses a user can fetch encrypted messages from the universe/authmailbox server if the wallet has the script key of the address. The authmailbox server shouldn't delete messages for unspent outputs, so a recovery should always be possible.
New universe RPCs required for multiverse proof lookup
To allow some of the multiverse lookups described above, we might need additional indexes into the universe/multiverse tree structure:
- Today we have
assetID => outpoint || scriptKey - Might also need
outpoint => assetID || scriptKeyandscriptKey => assetID || outpoint
These new lookup methods might make it easy to enumerate assets in transfers observed by third parties and might therefore not be optimal for privacy. We should attempt to implement all recovery procedures without relying on those new lookup methods.
Structure of on-chain asset recovery file (Chain Asset Recovery File, CARF?)
We periodically (see triggers below) create a flat file containing:
- All TAP addresses known to the daemon, including full internal key and script key information (descriptor+locator) and proof courier address.
- All asset outpoints (on-chain outpoint + asset ID + group key + script key) of all currently unspent asset outputs.
- We should also attempt to store what universe URL we used to sync each asset, which we currently don't store.
- Alternatively we can just save all currently configured federation servers to the recovery file.
- All genesis, meta and group key reveal information for our owned assets.
- We might need to exclude the actual meta data as that can be up to 1 MiB and make the file very big very quickly. We should be able to sync that data again when recovering, so it shouldn't be lost for good if we exclude it from the recovery file.
- All internal and script key information (descriptor+locator) to make sure we cover any keys derived for asset channel operations.
The file should be encoded as TLV and encrypted the same way the lnd SCB file is (using the special lnd key path used specifically for the SCB encryption).
The documentation should be updated to mention how to create a filesystem (inotifywait) based trigger to back up the file every time it is changed. Can use this as example: https://gist.github.com/alexbosworth/2c5e185aedbdac45a03655b709e255a3
Triggers for updating the on-chain asset recovery file
We update (using the same atomic create-new-file-then-swap-in-place mechanism used by lnd for the SCB file) the recovery file whenever:
- We start up the daemon
- We create a new on-chain address
- We import a new asset from a received proof
- Whenever
ImportProofof the asset store is called, which has the following origin paths (examples, potentially incomplete):ImportProofRPC (deprecated and dev-only)RegisterTransferRPC for external (vPSBT) flowsChainPorter.storeProofsfor change outputs of asset sends or any action performed byAuxSweeperBatchCaretaker.storeMintingProofsfor newly minted assetsCustodian.receiveProoffor new incoming transfersAuxSweeper.importCommitTxfor force closed asset channelsAuxCloser.FinalizeClosefor coop closed asset channels
- Whenever
Those should be the main events at which a new owned asset enters our database.
Steps to complete:
- Create a backup subsystem that subscribes to the above notifications and updates a file on disk whenever a new event comes in
- Allow the full file system path of the above mentioned file to be configured as a config/CLI flag (so it could be on a different file system, like a mounted network file system)
Draft of recovery procedure
From the most recent available recovery file the user can attempt to restore the assets available at that point.
This should be used on a fresh/empty tapd node. But because everything in the database should be implemented using upserts, it should theoretically also work on an existing node.
During implementation we should probably consider users testing this on their existing node with data already present in their database. Nothing catastrophic should happen in that case.
Potentially we could even use the recovery file as a lightweight tool to move/migrate assets from one tapd instance to another (ONLY if they were connected to the same lnd instance of course, since the file WILL NOT CONTAIN ANY KEY MATERIAL, only key descriptor+locator information).
Sketch of recovery procedure:
- User calls new
RecoverChainAssetsRPC - Encrypted recovery file is decrypted using the
lndwallet's special SCB key. If decryption fails, the file was created using anotherlndbacking node and the process must be aborted. - Upsert all genesis/meta/group key information from the file into the database.
- Upsert all internal and script key information from the file into the database.
- Upsert all TAP addresses (including script and internal keys) into the database.
- For all V0/V1 TAP addresses:
- Derive on-chain Taproot output key, add to
lndwallet, request chain rescan (the rescan is the main part that doesn't happen after importing an address normally, because when we create an address for the first time, we don't expect there to already be outputs for it) - For all unspent outputs found by
lndafter the rescan, query all available universe servers for available proofs (skip hashmail couriers, as those likely won't be available anymore), import found proofs (this is mostly to cover any assets sent to existing addresses in the time between the moment the backup was created and the moment the recovery was issued, all other assets should be covered in the list of asset outpoints processed in the step below).
- Derive on-chain Taproot output key, add to
- For all V2 TAP addresses (this process should already be kicked off by the
Custodianafter importing a new address):- Connect/subscribe to the authmailbox using the script key of the TAP address and the authmailbox address specified by the TAP address' proof courier address.
- For each message received, import all proofs.
- For all V0/V1 TAP addresses:
- For each asset outpoint (on-chain outpoint + asset ID + group key + script key) in the recovery file, attempt to fetch and import the full proof provenance using all available universe servers.
Steps to complete:
- Create an RPC that takes a backup file and inserts all the information in it, resulting in the addresses/assets/transfers to be fully restored in an empty (optionally also non-empty) node according to above sketched procedure
- (optional) Create a new RPC that on demand returns the current content of the backup file as a binary blob
- (optional) Create a new streaming RPC that emits an event whenever the backup notification service signals that the backup file was updated
Recovery of assets in asset Lightning Channels
As with normal (BTC-only) Lightning Channels, emergency recovery in case of a data loss depends on the peer force closing a channel (and watchtowers providing the incentive to not publish an old state).
The Static Channel Backup (SCB) file created by lnd contains the information required to contact a peer through the LN p2p network and requesting a force close of a channel.
The SCB file currently contains all the necessary information (combined with tracking progress on-chain) to then sweep the BTC funds from a remotely force closed channel.
To be able to sweep the assets contained in an asset channel, a recovering node also needs to be able to find the exact asset outpoints (on-chain outpoint + asset ID + group key + script key) of their balance in order to query a universe for the full proof.
There are two ways (maybe more?) in which this can be achieved:
- Whenever the p2p connection to a peer is established (
channel_reestmessage), we expect them to include the current asset output distribution to be added to the custom TLV part of that message. See [feature]: backup and recovery #426 (comment). - Upgrade the asset channel code to use the
authmailboxfeature to send the list of outputs to the mailbox server (similar to how v2 addresses do that for grouped address receives), using the channel'sto_remoteinternal key (toRemoteTree.InternalKey) as the encryption/receiver key. Then thelndSCB file simply needs to contain a custom blob with the authmailbox proof courier address and the recovering node can pull the information from there, authenticating itself with the receiver key.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status