Migration of Blockstore to use multihash instead of CID as key #2415
Description
This is part of #1440 endeavor.
Motivation
Currently, Block-store (key-value store) uses CID as a key for the block's data. As CIDv1 can have different bases used for encoding, it can happen that the same data will be duplicated several times because of CID with different base encodings. The main motivation to tackle this problem is the shift from CIDv0 (eq. base58) to CIDv1 (eq. default base32, yet as mentioned any other encoding is also possible).
Solution
Use CID's multihash as a key in Block-store.
Parts affected
This change will ripple through several commands/packages. Here is a list of things I have discovered in analysis. The main parts affected will be related to parts of code that uses query
on the repo's blockstore.
- Garbage Collection - lists all stored blocks CIDs, compare it with pinned CIDs and remove those not pinned.
- This will have to be changed to not depend on comparing CIDs but multihashes. Eq. take only pinset, extract multihashes out of those and do GC based on those.
Problems and possible solutions
-
ipfs refs local
- returns list of CIDs of locally stored objects- Constructing new CIDs - We could return new CIDs with base32 wrapped around the stored multihashes. But if somebody would store a block under different encoding, then they won't find it in this listing.
- Retain original CID - We could wrap the data in object to keep the original CID as metadata something like:
{ cid: key, data: buff }
and store that in datastore. Or have different Map stored aside to track this, yet there will be a possibility of "conflicts" (eq. several CIDs having the same multihash), how should that be handled?
-
Class
Block
(injs-ipfs-block
) hascid
property, should it be changed tomultihash
? This is used heavily in many packages though, so I guess not, but then it won't be always possible to create a Block with CID (eq. see the previous problem).
Questions
As discussed in weekly call, @Stebalien mentioned that "provider records need to use raw multihashes". Is this related to Bitswap and ipfs dht findprovs
? @Stebalien? If so, then does it mean that Bitswap should be changed to use/negotiate/exchange around multihashes instead of CIDs? How far should this ripple? Content routing? I am not so familiar with this part of the codebase, so I will need some guidance on this. Also I am not sure if this needs to happen right away? I feel like this is related but not required for what we are doing here right now.
@alanshaw please also provide your input.