-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Closed
Labels
Milestone
Description
Part of #4143.
Currently, we store blocks by CID in the datastore. However, a single block can have multiple CIDs:
- CIDv0/CIDv1 -- blocks created as cidv0 can be referenced with v1 CIDs
- CIDv1-*/CIDv1-raw -- we can treat any block as a "raw" block
- CIDv1-cbor/CIDv1-dag-cbor -- theoretically, we have non-"dag" versions of our codecs.
Plan:
TODO: put this in the right spot: ipfs/fs-repo-migrations#95
- Create a repo migration to switch from CIDs to multihashes. Luckily, all CIDv0 blocks (most blocks) are valid multihashes so we don't have to move them.
- Migration 11-12: migrate CIDsv1 to raw multihashes fs-repo-migrations#95
- This migration should iterate over all blocks in the datastore:
- If the block is stored using CIDv1, we should add it again using the raw multihash (if it's not already there).
- Delete the CIDv1 block.
- If the datastore offers transactions, we should ideally perform each add/delete in a transaction.
- We should probably have multiple threads doing this for better performance.
- We should do this with synchronous writes enabled. Alternatively, we could do this in two passes (add all then delete all) but that's going to double the amount of disk space used.
- Optional: Iterate over all provider records and migrate them as well. This is not critical as these will expire anyways.
- Convert CIDs to/from multihashes in the blockstore layer. At this point, the external blockstore interfaces will still use CIDs.
- Fix GC (in go-ipfs) to work when the datastore no longer stores codec information.
- Remove the previous CIDv0/CIDv1 blockstore hacks.
- Eventually, switch the blockstore API to use multihashes directly instead of CIDs.
- 2021-12-14: this will get handled as part of blockstore: switch from Cid to Multihash boxo#361
2021-12-02 context on why were' doing this:
- Helps us to move towards CIDv1 by default
- Deduplicates data that’s stored multiple times (e.g. as CIDv0, CIDv1-DAG-PB, CIDv1-Raw)
- Removes the technical debt and tribal knowledge around the different interfaces used by go-ipfs and anything newer (lotus, venus, estuary, …)
- Unlocks bigger refactors around blockstores going forward
- Enables any pinning services backed by go-ipfs to serve content for unknown IPLD codecs
- e.g. all those CAR files filled with Bitcoin blockchain data could be stored by services that don’t have the Bitcoin codec
- Allows for greater experimentation from groups making their own codecs even before they land their codec in go-ipfs by default