Description
Dedicated unixfs implementations are often much more performant and efficient.
See for example two of mines:
- https://github.com/Jorropo/linux2ipfs (blazingly fast file adder / fastest
ipfs add
implementation that exists) - https://github.com/Jorropo/go-featheripfs (high performance very high efficiency ordered car ->
io.Reader
file with incremental verification and data streaming)
I think it is more sustainable to maintain a handful of dedicated implementations that individually do one job well (something like feather and something that would maintain state to support unordered dags are really different).
Unified middle layer
The way I have been writing my implementations is by jugling bytes unixfs.proto
and dag-pb.proto
by hand everywhere.
This is kinda meh looking because of all the little details that have to be checked.
I really think we need some efficient middle layer that takes in blocks.Block
parse them, do sanity checks and return some representation that looks like internal unixfs (so one node per block, and full support for all schenanigans you would want to do but with error checking):
package unixfs
type Type uint8
const (
_ Type = iota
Directory
File
Metadata
Symlink
HAMTShard
)
type Node struct{
Type Type
Data []byte // for files
Entries []Entry
DagSize uint64
HashType uint64 // for HAMT dirs
Fanout uint64 // for HAMT dirs
}
type Entry struct{
Link
Name string // for directories
}
type Link struct{
Type Type // Indicative, not actually
Cid cid.Cid
FileSize uint64 // for files
DagSize uint64
}
func (*Node) UnmarshalIPFS(blocks.Block) error
func (*Node) MarshalIPFS() (blocks.Block, error)
func (*Node) Encode() (Link, blocks.Block, error)
// File operations
func (*Node) ParseFile(Link, blocks.Block) error
func (*Node) AddFileSegments(Link...) error
func (*Node) FileSize() uint64
// Directory operations
func (*Node) ParseDirectory(Link, blocks.Block) error
func (*Node) AddChildrenNode(Link...) error
// other types ...
This would not includes helper functions for more advanced stuff like HAMT, chunking, ... we will most likely need thoses too but the goal here is to provide a thin wrapper around the protobuf that add the repetitive input validation.
Impls
List of implementations we need:
- Efficient state-light streaming decoder with incremental verification. (I'll probably use feather after refactoring it to use the unified lower layer and adding the missing features)
The goal is to incremental verify ordered.car
files from saturn on the cheap resource wise. (and other ordered blocks sources)- Capable of streaming the verification of an ordered stream of block (such as a car file red from
io.Reader
) incrementally (while streaming the result out). - Interface must implement
io.Reader
for reading from files without any background goroutine (efficiency). - It must support incrementally verifying requests mapping to feat: refactor gateway api to operate on higher level semantics #176 sementics.
- Capable of streaming the verification of an ordered stream of block (such as a car file red from
- Decoder with incremental verification and output stream with random walk BFS input.
- Streaming data in order can have a high cost, however if we are cleaver we can use
.WriteAt
to write blocks in any order as long as we receive roots before leaves (or cache leaves but then incremental verification is not possible)
- Streaming data in order can have a high cost, however if we are cleaver we can use
- MFS rewrite.
- Some write opperation helpers (concatenate, chunking, ...)
- ... ?