-
Notifications
You must be signed in to change notification settings - Fork 237
Description
Context: https://github.com/ipfs/ipfs/issues/337
Currently, both js-ipfs and go-ipfs index blocks in the datastore by CID. Unfortunately, it's possible for a single block to have multiple CIDs due to (a) different CID versions and (b) due to different multicodecs (e.g., dag-cbor v. cbor v. raw).
The primary concern is (a). As we switch to returning CIDv1 (for multibase support), we still want to be able to lookup blocks that were fetched/added as CIDv0.
Currently, when looking up a CID, both go-ipfs and js-ipfs will first attempt to lookup the CID under the original CID version, then under the other CID version. However, this costs us two look-ups.
Primary use-case:
Ensure that CIDv1 and CIDv0 can be used interchangeably, especially in the gateway.
Proposals:
- Index blocks by multihash only.
- Index blocks by CIDv1.
- Double indirection. Map CIDs to a locally chosen hash function to blocks.
- Store extra metadata along with the block (known codecs, refcounts, etc.).
- Unclear how to do this efficiently with a datastore (would likely require a bunch of separate writes/keys).
Desired properties:
- No duplicate blocks when...
a. CID Versions differ: 1, 2 & 3
b. CID Codecs differ: 1 & 3.
c. Hash functions differ: 3. - No discarded structural information (keep codecs): 2 & 3.
- Viable migration path: 1, 2 & 3.
- Fast migration path: 1 & 3.
- Zero Overhead:
a. Time: 1, 2, & 4.
b. Space: 1 & 2.
The current consensus is option 1 (multihash) (ipfs/kubo#6815, ipfs/js-ipfs#2415). The proposal here is to consider option 2 (and at least look at the others).
However, option 1 doesn't give us property 2 as it discards the codec on write. @jbenet has objected strongly to this as we will be discarding structural information about the data. This information will still be stored in the pin set but we'll be losing this information for unpinned data.
We have also run into this same issue when trying to write a reverse datastore migration, migrating back from multihashes to CIDs: we need to somehow recover the codecs. The reverse migration in Option 2 would simply be: do nothing.
We need to consider the pros/cons of switching to option 1 before proceeding.
Side note: why should we even care about (b), multiple codecs for the same block?
IPLD objects as "files"
- I might want to add one or more CBOR-IPLD objects to an IPFS directory for legacy applications, and address them in IPLD.
- I might want to address a yaml/json object in an IPFS directory as an IPLD object.
Dumb block transport
I might want to take an unbalanced DAG (blockchain), a DAG with weird codecs (git, eth, etc.), etc. and sync it with another node. It would be nice if I could take this DAG, treat all the nodes as raw nodes, then build a well-balanced "overlay-dag" using simple, well-supported codecs.
This might, for example, be useful for storing DAGs with custom codecs in pinning services.