-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blockstore: switch from Cid to Multihash #361
Comments
Doesn't appear to be any good reason in the git log.
I really would just use the raw multicodec. From a type-theory standpoint, it's effectively the "top" type. Regardless of what we do, we'll break userspace here. |
My changing the Cid to a Multicodec in a block will will be losing information. This will require a lot of careful thought so everything still works. I am against doing it at this stage. |
After thinking about it a while here is what I think the interface should look like now: type Blockstore interface {
Delete(mh.Multihash) error
Has(mh.Multihash) (bool, error)
Get(*cid.Cid) (blocks.Block, error)
// GetSize returns the Multihash's mapped BlockSize
GetSize(mh.Multihash) (int, error)
// Put puts a given block to the underlying datastore
Put(blocks.Block) error
// PutMany puts a slice of blocks at the same time using batching
// capabilities of the underlying datastore whenever possible.
PutMany([]blocks.Block) error
// AllKeysChan returns a channel from which
// the CIDs in the Blockstore can be read. It should respect
// the given context, closing the channel if it becomes Done.
AllKeysChan(ctx context.Context) (<-chan mh.Multihash, error)
// HashOnRead specifies if every read block should be
// rehashed to make sure it matches its CID.
HashOnRead(enabled bool)
} I am not necessary against change Get() to accept a Multihash, but that will involve changing the We will have a presentation problem when presenting raw multihashes to the user as they stand now they are indistinguishable from CIDv0. I see two solutions to this problem. (1) Introduce a new Codec to mean "unknown" to the Cid standard as outlined in the description. When we need to display a raw Multihash it will be converted to a Cid with the "unknown" condec. (2) Since we decided that CidV0 will not get a multibase prefix. Define a multihash with a multibase prefix as a raw multihash and never a CidV0. All multihashes will then be displayed with a multibase prefix when converted to a string. Personally I like (1) better as it also solves the problem of how to handle the case when we have a raw multihash but need a full CID. We can eliminate the need as much as possible, but I still think it will come up from time to time based on the pervasiveness of CIDs. |
@whyrusleeping @Stebalien the
Note the if I understand correctly we want to continue to make network requests using CIDs, thus the blockservice will continue to need to work with CIDs and can not switch to using Multihashes. Thus, my proposed changes to the blockstore might interfear with the interface merger. Or am I missing something? |
My point was, instead of introducing an unknown multicodec, just use raw. The issue here is the
That's something @whyrusleeping would like but can wait. Basically, he just wants to get rid of the blockservice. |
Yes we can do that but as I stated above I don't see that as correct. By using a new Cid type we are making it clear that we don't know the Cid type. The only command this will effect is
Okay then. I won't consider it a blocker. |
It is correct. Every piece of data is a valid raw block (it's the |
Maybe it is correct, but I still think |
Ideally, we'll eventually deprecate the entire API in favor of something that works with raw multihashes. |
Note: The multicodec on a CID is not a type. It tells you how to interpret a binary blob as structured data. From the perspective of IPLD, "unknown" is pretty much useless. You are right, we could have a sentinel "invalid" type but I'd rather just produce useful CIDs. |
That is exactly the idea. It is suppose to be useless as we don't know the original codec as it was created from a raw multihash. Doing anything with it other than passing it around will result in an error. It is meant only for output of command like |
I believe this can be closed: ipfs/go-ipfs-blockstore#38 shipped and depending on context, we switched to multihash (datastore keys) or CIDv1 with |
Reopening for now for tracking, but we'll likely want to create a new issue. Related to #362 |
We should change the Blockstore interface once it is updated to store blocks my Multihash and not the full Cid. One way to go is to remove the notation of Cids from a block, however the Cid is very useful in that context and doing so will likely break things. Instead I propose we change the types of two method (1)
AllKeysChan
will return multhashes instead of Cid, this is basically required because we no longer have the necessary information to reconstruct the Cid, (2)DeleteBlock
will take a multihash, this is not necessary required but it is more honest, because if we delete a block from a Cid we are not just deleting that Cid, but all Cids with that multihash.The proposed interface is thus:Note: See #361 for the updated interface
Note that since a Cid is part of a block the
Get
needs the full Cid to reconstruct the block. I have miked feeling about theHas
method. I was thinking that a "smart" blockstore could use the extra information, but I can't think of a use-case (note that theidstore
works at the multihash level so it doesn't need to full Cid). One argument is it should take the full Cid so it has the same parameter as theGet
method.One question: Why is
DeleteBlock
named that way? Is there a reason it is not justDelete
? If this is purely for historical reasons, I propose we rename this to justDelete
since we are changing the API anyway.As it may sometime be useful to retrieve (or even add) a block at the multihash level, I also propose we introduce a new Codec to mean "unknown" to the Cid standard, for situations when all we have is the multihash but we need a full Cid. Trying to do anything with the data of a block with an "unknown" codec would result in an error. The "raw" codec could be used but it has a slightly different meaning, in particular it means that the data does not have any structure, rather than the structure being unknown. One use for this codec is for the output of
ipfs refs local
. We could just display the multihash, but that can easily be confused with CidV0. If instead we convert them to CidV1 with a "unknown" codec there will be no confusion and this will also allow us to display in different bases.Thoughts?
The text was updated successfully, but these errors were encountered: