Skip to content
This repository was archived by the owner on Feb 12, 2024. It is now read-only.
This repository was archived by the owner on Feb 12, 2024. It is now read-only.

Migration of Blockstore to use multihash instead of CID as key #2415

Closed
@AuHau

Description

@AuHau

This is part of #1440 endeavor.

Motivation

Currently, Block-store (key-value store) uses CID as a key for the block's data. As CIDv1 can have different bases used for encoding, it can happen that the same data will be duplicated several times because of CID with different base encodings. The main motivation to tackle this problem is the shift from CIDv0 (eq. base58) to CIDv1 (eq. default base32, yet as mentioned any other encoding is also possible).

Solution

Use CID's multihash as a key in Block-store.

Parts affected

This change will ripple through several commands/packages. Here is a list of things I have discovered in analysis. The main parts affected will be related to parts of code that uses query on the repo's blockstore.

  1. Garbage Collection - lists all stored blocks CIDs, compare it with pinned CIDs and remove those not pinned.
    • This will have to be changed to not depend on comparing CIDs but multihashes. Eq. take only pinset, extract multihashes out of those and do GC based on those.

Problems and possible solutions

  1. ipfs refs local - returns list of CIDs of locally stored objects

    • Constructing new CIDs - We could return new CIDs with base32 wrapped around the stored multihashes. But if somebody would store a block under different encoding, then they won't find it in this listing.
    • Retain original CID - We could wrap the data in object to keep the original CID as metadata something like: { cid: key, data: buff } and store that in datastore. Or have different Map stored aside to track this, yet there will be a possibility of "conflicts" (eq. several CIDs having the same multihash), how should that be handled?
  2. Class Block (in js-ipfs-block) has cid property, should it be changed to multihash? This is used heavily in many packages though, so I guess not, but then it won't be always possible to create a Block with CID (eq. see the previous problem).

Questions

As discussed in weekly call, @Stebalien mentioned that "provider records need to use raw multihashes". Is this related to Bitswap and ipfs dht findprovs? @Stebalien? If so, then does it mean that Bitswap should be changed to use/negotiate/exchange around multihashes instead of CIDs? How far should this ripple? Content routing? I am not so familiar with this part of the codebase, so I will need some guidance on this. Also I am not sure if this needs to happen right away? I feel like this is related but not required for what we are doing here right now.

@alanshaw please also provide your input.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions