Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blockstore: switch from Cid to Multihash #361

Open
kevina opened this issue Jul 29, 2018 · 13 comments
Open

blockstore: switch from Cid to Multihash #361

kevina opened this issue Jul 29, 2018 · 13 comments
Labels
need/triage Needs initial labeling and prioritization

Comments

@kevina
Copy link
Contributor

kevina commented Jul 29, 2018

We should change the Blockstore interface once it is updated to store blocks my Multihash and not the full Cid. One way to go is to remove the notation of Cids from a block, however the Cid is very useful in that context and doing so will likely break things. Instead I propose we change the types of two method (1) AllKeysChan will return multhashes instead of Cid, this is basically required because we no longer have the necessary information to reconstruct the Cid, (2) DeleteBlock will take a multihash, this is not necessary required but it is more honest, because if we delete a block from a Cid we are not just deleting that Cid, but all Cids with that multihash.

The proposed interface is thus:
Note: See #361 for the updated interface

type Blockstore interface {
	DeleteBlock(mh.Multihash) error
	Has(*cid.Cid) (bool, error)
	Get(*cid.Cid) (blocks.Block, error)

	// Put puts a given block to the underlying datastore
	Put(blocks.Block) error

	// PutMany puts a slice of blocks at the same time using batching
	// capabilities of the underlying datastore whenever possible.
	PutMany([]blocks.Block) error

	// AllKeysChan returns a channel from which
	// the CIDs in the Blockstore can be read. It should respect
	// the given context, closing the channel if it becomes Done.
	AllKeysChan(ctx context.Context) (<-chan mh.Multihash, error)

	// HashOnRead specifies if every read block should be
	// rehashed to make sure it matches its CID.
	HashOnRead(enabled bool)
}

Note that since a Cid is part of a block the Get needs the full Cid to reconstruct the block. I have miked feeling about the Has method. I was thinking that a "smart" blockstore could use the extra information, but I can't think of a use-case (note that the idstore works at the multihash level so it doesn't need to full Cid). One argument is it should take the full Cid so it has the same parameter as the Get method.

One question: Why is DeleteBlock named that way? Is there a reason it is not just Delete? If this is purely for historical reasons, I propose we rename this to just Delete since we are changing the API anyway.

As it may sometime be useful to retrieve (or even add) a block at the multihash level, I also propose we introduce a new Codec to mean "unknown" to the Cid standard, for situations when all we have is the multihash but we need a full Cid. Trying to do anything with the data of a block with an "unknown" codec would result in an error. The "raw" codec could be used but it has a slightly different meaning, in particular it means that the data does not have any structure, rather than the structure being unknown. One use for this codec is for the output of ipfs refs local. We could just display the multihash, but that can easily be confused with CidV0. If instead we convert them to CidV1 with a "unknown" codec there will be no confusion and this will also allow us to display in different bases.

Thoughts?

@Stebalien
Copy link
Member

Why is DeleteBlock named that way? Is there a reason it is not just Delete? If this is purely for historical reasons, I propose we rename this to just Delete since we are changing the API anyway.

Doesn't appear to be any good reason in the git log.

As it may sometime be useful to retrieve (or even add) a block at the multihash level, I also propose we introduce a new Codec to mean "unknown" to the Cid standard, for situations when all we have is the multihash but we need a full Cid.

I really would just use the raw multicodec. From a type-theory standpoint, it's effectively the "top" type. Regardless of what we do, we'll break userspace here.

@kevina
Copy link
Contributor Author

kevina commented Aug 6, 2018

I really would just use the raw multicodec.

My changing the Cid to a Multicodec in a block will will be losing information. This will require a lot of careful thought so everything still works. I am against doing it at this stage.

@kevina
Copy link
Contributor Author

kevina commented Aug 8, 2018

After thinking about it a while here is what I think the interface should look like now:

type Blockstore interface {
	Delete(mh.Multihash) error
	Has(mh.Multihash) (bool, error)
	Get(*cid.Cid) (blocks.Block, error)

	// GetSize returns the Multihash's mapped BlockSize
	GetSize(mh.Multihash) (int, error)

	// Put puts a given block to the underlying datastore
	Put(blocks.Block) error

	// PutMany puts a slice of blocks at the same time using batching
	// capabilities of the underlying datastore whenever possible.
	PutMany([]blocks.Block) error

	// AllKeysChan returns a channel from which
	// the CIDs in the Blockstore can be read. It should respect
	// the given context, closing the channel if it becomes Done.
	AllKeysChan(ctx context.Context) (<-chan mh.Multihash, error)

	// HashOnRead specifies if every read block should be
	// rehashed to make sure it matches its CID.
	HashOnRead(enabled bool)
}

I am not necessary against change Get() to accept a Multihash, but that will involve changing the
Cid's in Blocks to Multihashs. This seams like more of a breaking change that can come later but I am not against attempting to push it though.

We will have a presentation problem when presenting raw multihashes to the user as they stand now they are indistinguishable from CIDv0. I see two solutions to this problem.

(1) Introduce a new Codec to mean "unknown" to the Cid standard as outlined in the description. When we need to display a raw Multihash it will be converted to a Cid with the "unknown" condec.

(2) Since we decided that CidV0 will not get a multibase prefix. Define a multihash with a multibase prefix as a raw multihash and never a CidV0. All multihashes will then be displayed with a multibase prefix when converted to a string.

Personally I like (1) better as it also solves the problem of how to handle the case when we have a raw multihash but need a full CID. We can eliminate the need as much as possible, but I still think it will come up from time to time based on the pervasiveness of CIDs.

@kevina
Copy link
Contributor Author

kevina commented Aug 20, 2018

@whyrusleeping @Stebalien the go-blockservice README says (https://github.com/ipfs/go-blockservice):

The interfaces here really would like to be merged with the blockstore interfaces. The 'dagservice' constructor currently takes a blockservice, but it would be really nice if it could just take a blockstore, and have this package implement a blockstore.
What is the background on that and will the proposed interface above interfere with that goal?

Note the if I understand correctly we want to continue to make network requests using CIDs, thus the blockservice will continue to need to work with CIDs and can not switch to using Multihashes. Thus, my proposed changes to the blockstore might interfear with the interface merger. Or am I missing something?

@Stebalien
Copy link
Member

I really would just use the raw multicodec.

My changing the Cid to a Multicodec (multihahs?) in a block will will be losing information. This will require a lot of careful thought so everything still works. I am against doing it at this stage.

My point was, instead of introducing an unknown multicodec, just use raw. The issue here is the ipfs refs command (and friends). Those need to return CIDs, unfortunately.

Note the if I understand correctly we want to continue to make network requests using CIDs, thus the blockservice will continue to need to work with CIDs and can not switch to using Multihashes. Thus, my proposed changes to the blockstore might interfear with the interface merger. Or am I missing something?

That's something @whyrusleeping would like but can wait. Basically, he just wants to get rid of the blockservice.

@kevina
Copy link
Contributor Author

kevina commented Aug 28, 2018

My point was, instead of introducing an unknown multicodec, just use raw. The issue here is the ipfs refs command (and friends). Those need to return CIDs, unfortunately.

Yes we can do that but as I stated above I don't see that as correct. By using a new Cid type we are making it clear that we don't know the Cid type. The only command this will effect is ipfs refs local. As long as we are following a dag will should have the full Cid.

That's something @whyrusleeping would like but can wait. Basically, he just wants to get rid of the blockservice.

Okay then. I won't consider it a blocker.

@Stebalien
Copy link
Member

Yes we can do that but as I stated above I don't see that as correct.

It is correct. Every piece of data is a valid raw block (it's the Any, interface{}, Object, what have you, type).

@kevina
Copy link
Contributor Author

kevina commented Aug 28, 2018

It is correct. Every piece of data is a valid raw block (it's the Any, interface{}, Object, what have you, type).

Maybe it is correct, but I still think Raw and Unknown convey different pieces of information. Thus, I still think an Unknown codec will be useful but it is not important for this P.R. and is not a blocker on anything so we can revisit the issue later.

@Stebalien
Copy link
Member

Ideally, we'll eventually deprecate the entire API in favor of something that works with raw multihashes.

@Stebalien
Copy link
Member

Note: The multicodec on a CID is not a type. It tells you how to interpret a binary blob as structured data. From the perspective of IPLD, "unknown" is pretty much useless.

You are right, we could have a sentinel "invalid" type but I'd rather just produce useful CIDs.

@kevina
Copy link
Contributor Author

kevina commented Aug 28, 2018

Note: The multicodec on a CID is not a type. It tells you how to interpret a binary blob as structured data. From the perspective of IPLD, "unknown" is pretty much useless.

That is exactly the idea. It is suppose to be useless as we don't know the original codec as it was created from a raw multihash. Doing anything with it other than passing it around will result in an error. It is meant only for output of command like ipfs refs local.

@lidel
Copy link
Member

lidel commented Dec 9, 2021

I believe this can be closed: ipfs/go-ipfs-blockstore#38 shipped and depending on context, we switched to multihash (datastore keys) or CIDv1 with raw codec (refs local output)

@aschmahmann
Copy link
Contributor

Reopening for now for tracking, but we'll likely want to create a new issue. Related to #362

@aschmahmann aschmahmann reopened this Dec 14, 2021
@lidel lidel changed the title RFC: Updates to the blockstore interface blockstore: switch from Cid to Multihash Jun 19, 2023
@ipfs ipfs deleted a comment from welcome bot Jun 19, 2023
@lidel lidel transferred this issue from ipfs/go-ipfs-blockstore Jun 19, 2023
@lidel lidel added the need/triage Needs initial labeling and prioritization label Jun 19, 2023
@lidel lidel moved this to 🥞 Todo in IPFS Shipyard Team Jun 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/triage Needs initial labeling and prioritization
Projects
No open projects
Status: 🥞 Todo
Archived in project
Development

No branches or pull requests

4 participants