-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Use new blockstore that stores Multihashes and not CIDs. #5510
Conversation
License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>
dagutils/utils.go
Outdated
@@ -70,7 +70,7 @@ func addLink(ctx context.Context, ds ipld.DAGService, root *dag.ProtoNode, child | |||
return nil, err | |||
} | |||
|
|||
_ = ds.Remove(ctx, root.Cid()) | |||
//_ = ds.Remove(ctx, root.Cid()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this call to remove.
dagutils/utils.go
Outdated
@@ -124,7 +124,7 @@ func (e *Editor) insertNodeAtPath(ctx context.Context, root *dag.ProtoNode, path | |||
return nil, err | |||
} | |||
|
|||
_ = e.tmp.Remove(ctx, root.Cid()) | |||
//_ = e.tmp.Remove(ctx, root.Cid()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is working on a temporary in memory dag service. Maybe we should (1) implement a direct in memory dag service that does that even use a blockservice or (2) just use a different data structure, like maybe something as simple as a go map?
How would publishing providers records work with this? |
Honestly, I think we can just keep it and say "be careful". We're not really going to be making this worse because we can already (accidentally) reuse nodes from other DAGs. Basically, there's no way to know that
This will, unfortunately, break bitswap compatibility with existing nodes. This is also a problem for providing. We'll have to pull in @diasdavid and @whyrusleeping because I know they've done quite a bit of thinking on this but here are some ideas: DAG-based-bitswap We can use a DAGService instead of a Blockservice to drive bitswap. We'll do this with graphsync anyways. a. Unfortunately, for full compatibility, we may still have to issue bitswap queries and provider records for both CID versions (which is going to suck and may yield duplicate blocks). However, we can (a) make all nodes accept any CID version and (b) eventually stop sending CIDv0 queries. IMO, this is the cleanest solution from a "where we're going" standpoint. b. We could also push this issue up the stack and say that the user (programmer) must make both requests if they're unsure (e.g., they've received a base32 CID from the API). This may be the best solution from a performance standpoint but it will annoying to use... c. For slightly worse compatibility, we could forcibly downgrade all CIDv1 DagPB CIDs to CIDv0 when we hit the provider system/bitswap. However, that's going to suck for IA (which uses CIDv1 DagPB CIDs, IIRC). It's also completely backwards because we're trying to upgrade to CIDv1, not downgrade. Just use multihashes Allow raw multihashes and eventually switch over to them entirely. Basically:
This means CIDv0 CIDs will still work. However, this is kind of crappy from a compatibility perspective and looses information relative to the dag-based-bitswap solution. Upgrade the network This is basically the solution you've presented plus an upgrade path:
However, this would be a major break. New Bitswap Version We can, in fact, introduce a new bitswap version. This solution can be combined with one of the other solutions. For example, it can be combined with DAG-based-bitswap, variant a: Per-peer, bitswap decides to either send (a) both CIDs or (b) a single CIDv1 CID (or a raw multihash, or a raw CID, etc).
So, multihashes + multibase is probably the most correct solution. Unfortunately, they're ambiguous with CIDs so it's probably confusing from a user perspective. As you say, we could also use hex-based multihashes (0x...). However, that would be the only place in all of IPFS where we'd be using them so I'm a bit reluctant to go with that. Basically, I'd rather not introduce a new concept to users. Honestly, I think raw nodes is probably the right balance. We could also have a special codec for "unknown" but I think we should talk to users to see what they think before we add a new codec. My primary reservation is that I'd rather not (a) have a codec that can't be decoded or (b) have a codec duplicating raw. But we'll have to discuss this with users. |
@Stebalien when you get a minute, can you outline the results of our discussion on lunch today (Thursday the 11th of October) to make sure we are all on the same page. |
I think it better to introduce a new concept rather then mislead users. By using the raw codec we might give the user the impression that nothing but raw leaves are stored in the blockstore.
I am going to have to disagree with you on this one. For (a) that is kind of the point. It is a red flag that something is special about this CID. For (b) I do not consider it a duplicate at all, it conveys a different meaning. |
I am not exactly conformable with this option, but you are not going to get much of an argument from me. |
Conclusions from the lab-week discussion:
|
@Stebalien thanks. I work to push those changes though and let you know if I run into any problems. |
License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>
License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>
Superseded by #6816 |
Depends on ipfs/go-ipfs-blockstore#13.
Will not even compile yet.
Pressing issues that need to be addressed:
The dagservice has a
Delete
method. This is problematic. For example if you insert a raw and a Protobuf version of the same Node then delete the one of those nodes via the dagservice both will be deleted. We should probably delete these methods. The only place theDelete
method seams to be used is indagutils/utils.go
. I have marked two of the three instances with comments. Also see RFC: Remove Delete methods from DagService go-ipld-format#43Due to the choice to keep Bitswap using Cid's there is one place where we have multihashes but need Cid's. For now I converted them to Raw CIDs for lack of anything better to do. See
NewBlockstoreProvider
inexchange/reprovide/providers.go
(https://github.com/ipfs/go-ipfs/pull/5510/files#diff-a23df8bd7581d29e09f90ef684b34631L16). Is this function even used, that is do we ever publish the complete lists of everything in the blockstore, and if so will this create a problem.As an aside, in these cases I believe we should introduce a new Unknown or Invalid codec type to mark the fact that we really don't know the original Codec as it was created via a multihash. Raw may be technically correct but is less informative. This was discussed in blockstore: switch from Cid to Multihash boxo#361 (with the last comment being blockstore: switch from Cid to Multihash boxo#361)
Raw multihashes will now be exposed to the users in several places. Two of those places are
ipfs refs local
andipfs repo gc
. How should this be handled. Here are some options (a) Use the default string method for multihashes which is a hex string (with no multibase prefix) (b) Tack on a multibase prefix and then display it like a CID (c) Make it a Cid using the Raw codec and display that (d) Make it a Cid using a new codec type to specify an Unknown or Invalid codec as discussed in (2). Right now am using (a) as it makes it very obvious that what is being returned is not a Cid as cids are hardly ever encoded in hex. (b) and (c) don't make it obvious what is going on. The special codec type might be helpful in other situations but it is less obvious than (a) on visual inspection.CC @Stebalien @whyrusleeping