-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add builder for unixfs dags #12
Conversation
data/builder/file.go
Outdated
// * we assume we are using CIDv1, which has implied that the leaf | ||
// data nodes are stored as raw bytes. | ||
// ref: https://github.com/ipfs/go-mfs/blob/1b1fd06cff048caabeddb02d4dbf22d2274c7971/file.go#L50 | ||
func BuildUnixFSFile(r io.Reader, chunker string, ls *ipld.LinkSystem) (ipld.Link, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think there's also a way we could make a NodeAssembler
type somewhere that has a constructor taking a chunker string, ls *ipld.LinkSystem
, and then we can have its AssignBytes
method call this?
It wouldn't be memory-efficient of course, but the fact that it works at all would be nice.
More importantly, it would get us within a centimeter of adding an AssignFromReader
method to that type, or something of that nature, which we could later let people feature-detect for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to do it at that level, or would we prefer the assembler a level up where it could handle either file or directory assembly (switching on being either given bytes or a map listing of nodes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Still kinda wanna do something about this but don't have an exact fullbaked idea of how I'd want it to look, either, so it's definitely not something to hold up the rest of this PR landing on.)
👍 for getting even more of the dots to be connected! |
} | ||
|
||
// a shard entry is either another shard, or a direct link. | ||
type entry struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reading comprehension check for me: did this not end up using any of the other types or code for dealing with sharding, which we already had for the read direction?
(Not necessarily offering judgement, just surprised if so, enough that I'm wanting to doublecheck my reading.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct. it can potentially overlap the shard type here with the _UnixFSHAMTShard
in hamt
, but
this would end up as the builder type rather than that type itself, and would need to do the full serialization here, since the structure in hamt expects a fully serialized pbnode substrate to exist that it's providing a view over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mostly concerned there could be bugs in the HAMT code -- see note about log2
I think the most important thing I'd like to know is how are we proving this is generating equivalent outputs. Is this manual? What testing have done so far? Can we automate somehow?
data/builder/dirshard.go
Outdated
|
||
func (s *shard) add(lnk hamtLink) error { | ||
// get the bucket for lnk | ||
bucket, err := lnk.hash.Slice(s.depth*s.size, s.size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure you're missing a log2 somewhere here. There's a lot of calculation of log2 in the HAMT sharding code, and it's cause even thought we want 256 width, that only means 8-bits per item.
there are tests in the go-unixfs implementation -- how hard would it be to substantially copy them to very you have the same result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I'm totally wrong -- that's part of why I'm wondering if we have a way to equivalence test this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a couple tests. you were right on this one 😨
|
||
// BuildUnixFSFile creates a dag of ipld Nodes representing file data. | ||
// This recreates the functionality previously found in | ||
// github.com/ipfs/go-unixfs/importer/balanced, but tailored to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guece we can live without trickle dag for the moment... I wonder if we even ever use this.
// go-unixfsnode & ipld-prime data layout of nodes. | ||
// We make some assumptions in building files with this builder to reduce | ||
// complexity, namely: | ||
// * we assume we are using CIDv1, which has implied that the leaf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love this part
This now depends on |
i've done some basic sanity checks. i want to wire this up in the downstream ipld/go-car#246 and put some car-equivalence fixtures there in a setting where there's easier access to interact with the data format here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no complaints here :)
(I'd disclaim I'm not an expert unixfsv1 reviewer, either. But it seems to be going in the right direction, at least, afaict.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also wondering whether the Must()
calls could realistically lead to panics. If the field should always be set, why is it optional to begin with?
|
||
// rough estimates on expected sizes | ||
var roughLinkBlockSize = 1 << 13 // 8KB | ||
var roughLinkSize = 34 + 8 + 5 // sha256 multihash + size + no name + protobuf framing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make these constants? otherwise I'm wondering whether you modify them anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are copied directly from how they're defined in the comment above this block.
bitfield "github.com/ipfs/go-bitfield" | ||
"github.com/ipfs/go-unixfsnode/data" | ||
"github.com/ipfs/go-unixfsnode/hamt" | ||
dagpb "github.com/ipld/go-codec-dagpb" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's all starting to come together with past work! :)
No description provided.