Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lassie compatible http api for fetching CARs #34

Open
olizilla opened this issue Apr 12, 2023 · 9 comments
Open

Lassie compatible http api for fetching CARs #34

olizilla opened this issue Apr 12, 2023 · 9 comments
Assignees

Comments

@olizilla
Copy link
Contributor

olizilla commented Apr 12, 2023

Offer an http api that Lassie / Saturn could use to fetch CARs from us. Tweak our existing CAR responses to match the Lassie spec.

We already support CAR responses, we just need to tweak the existing code to write traversed blocks to the CAR to make them verifiable, and handle sending subsets of the total dag when directed to by the ?car-scope param.

### Tasks
- [ ] #32
- [x] Ensure expected block ordering (see: https://github.com/filecoin-project/lassie/blob/main/docs/CAR.md#block-deduplication)
- [ ] #33
- [ ] https://github.com/web3-storage/w3up/issues/786
- [x] Test against lassie
- [ ] Dedupe repeated blocks in CAR response (see: https://github.com/filecoin-project/lassie/blob/main/docs/CAR.md#block-deduplication)
- [ ] Ensure Identity CIDs are not included as blocks (see: https://github.com/filecoin-project/lassie/blob/main/docs/CAR.md#dag-depth)
olizilla added a commit to storacha/dagula that referenced this issue Apr 17, 2023
add getPath method as a generator that returns blocks for the targeted dag and all blocks traversed while resolving a cid+path string

supports carScope to specify what blocks to return for the resolved dag
- 'all': return the entire dag starting at path. (default)
- 'block': return the block identified by the path.
- 'file': Mimic gateway semantics: Return All blocks for a multi-block file or just enough blocks to enumerate a dir/map but not the dir contents.

see: storacha/freeway#33
see: storacha/freeway#34

TODO:
- [] find out how to identify the boundaries of a unixfs hamt (unixfs-exported seems to define it as "not having an empty or null Link.Name after the first 2 chars are stripped, which seems risky... what happens if the actual dir listing has 2 char long link names? see: https://github.com/ipfs/js-ipfs-unixfs/blob/e853049bd63d6773442e1540ae49b6a443ca8672/packages/ipfs-unixfs-exporter/src/resolvers/unixfs-v1/content/hamt-sharded-directory.ts#L20-L42

License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
@willscott
Copy link

willscott commented Apr 24, 2023

  • Test against lassie
  • Announce HTTP endpoint as an extended family member to the indexer
    • there's a multicodec to advertise for supporting http transfer that we'll want to have as an 'extended family member' of the eipfs provider's advertisement stream.

olizilla added a commit to storacha/dagula that referenced this issue May 1, 2023
add getPath method as a generator that returns blocks for the targeted
dag and all blocks traversed while resolving a cid+path string

supports carScope to specify what blocks to return for the resolved dag
- `'all'`: return the entire dag starting at path. (default)
- `'block'`: return the block identified by the path.
- `'file'`: Mimic gateway semantics: Return All blocks for a multi-block
file or just enough blocks to enumerate a dir/map but not the dir
contents.

see: storacha/freeway#33
see: storacha/freeway#34
see: ipfs/specs#402

TODO:
- [x] find out how to identify the boundaries of a unixfs hamt 

...unixfs-exporter seems to define it as "not having an empty or null
Link.Name after the first 2 chars are stripped, which seems loose...
what happens if the actual dir listing has 2 char long link names? see:
https://github.com/ipfs/js-ipfs-unixfs/blob/e853049bd63d6773442e1540ae49b6a443ca8672/packages/ipfs-unixfs-exporter/src/resolvers/unixfs-v1/content/hamt-sharded-directory.ts#L20-L42

License: MIT

---------

Signed-off-by: Oli Evans <oli@protocol.ai>
Co-authored-by: Alan Shaw <alan.shaw@protocol.ai>
olizilla added a commit to storacha/gateway-lib that referenced this issue May 1, 2023
- update dagula to get `getPath` with carScope support https://github.com/web3-storage/dagula/releases/tag/v6.0.0
- update handleCar to extract ?car-scope query and use `dagula.getPath`

BREAKING CHANGE: CARs returned for cid+path will now be rooted at the root cid rather than the resovled cid for the end of the path and include all blocks needed to verify the path was traveresed correctly.

see: storacha/freeway#33
see: storacha/freeway#34

License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
olizilla added a commit to storacha/gateway-lib that referenced this issue May 2, 2023
- update dagula to get `getPath` with carScope support
https://github.com/web3-storage/dagula/releases/tag/v6.0.0
- update handleCar to extract ?car-scope query and use `dagula.getPath`

BREAKING CHANGE: CARs returned for cid+path will now be rooted at the
root cid rather than the resovled cid for the end of the path and
include all blocks needed to verify the path was traveresed correctly.

see: storacha/freeway#33
see: storacha/freeway#34

License: MIT

---------

Signed-off-by: Oli Evans <oli@protocol.ai>
olizilla added a commit that referenced this issue May 2, 2023
update `dagula` and `gateway-lib` to add supoport for car-scope and verifiable paths for cars

see: #34

License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
olizilla added a commit that referenced this issue May 2, 2023
update `dagula` and `gateway-lib` to add supoport for car-scope and
verifiable paths for cars

see: #34

License: MIT

Signed-off-by: Oli Evans <oli@protocol.ai>
@olizilla
Copy link
Contributor Author

olizilla commented May 3, 2023

@willscott is "Announce HTTP endpoint as an extended family member to the indexer" required before we can test against lassie?

@willscott
Copy link

I don't think so!

Lassie as a CLI App allows retrieval against a manually specified provider endpoint, skipping the indexer lookup.

@olizilla
Copy link
Contributor Author

olizilla commented May 3, 2023

ah, i see, something something lassie fetch --providers...

@willscott
Copy link

you'll want to be on the http retrieval branch filecoin-project/lassie#204 in order to test

cc @rvagg

@olizilla
Copy link
Contributor Author

olizilla commented May 3, 2023

Success!? ✨ 🎷 🐩

~/Code/filecoin-project/lassie on rvagg/http  
❯ ./lassie fetch --providers /dns4/freeway.dag.haus/tcp/443/https/p2p/bafzbeibhqavlasjc7dvbiopygwncnrtvjd2xmryk5laib7zyjor6kf3avm bafybeihetumsy24mjzbts7t4vpft2rwput44joxxnzxhfh5woq6z46fe2y
Fetching bafybeihetumsy24mjzbts7t4vpft2rwput44joxxnzxhfh5woq6z46fe2y from [{QmQzqxhK82kAmKvARFZSkUVS6fo9sySaiogAnx5EnZ6ZmC: [/dns4/freeway.dag.haus/tcp/443/https]}]...........
Fetched [bafybeihetumsy24mjzbts7t4vpft2rwput44joxxnzxhfh5woq6z46fe2y] from [QmQzqxhK82kAmKvARFZSkUVS6fo9sySaiogAnx5EnZ6ZmC]:
	Duration: 809.277584ms
	  Blocks: 11
	   Bytes: 3.2 MiB

❯ ipfs-car --list-full bafybeihetumsy24mjzbts7t4vpft2rwput44joxxnzxhfh5woq6z46fe2y.car
bafybeihetumsy24mjzbts7t4vpft2rwput44joxxnzxhfh5woq6z46fe2y bafybeihetumsy24mjzbts7t4vpft2rwput44joxxnzxhfh5woq6z46fe2y
bafkreidpr7zz5yflyocmglpref5vvx4yglo3zmihh3mpiaezecgrggqwiq bafybeihetumsy24mjzbts7t4vpft2rwput44joxxnzxhfh5woq6z46fe2y/johnny-5-goes-camping.jpg
...etc

Minor: you get an error if you try and provide a more succinct http flavour multiaddr sans p2p like /dns4/freeway.dag.haus/tcp/443/https

~/Code/filecoin-project/lassie on rvagg/http  
❯ ./lassie fetch --providers /dns4/freeway.dag.haus/tcp/443/https bafybeihetumsy24mjzbts7t4vpft2rwput44joxxnzxhfh5woq6z46fe2y
2023-05-03T12:24:48.376+0100	FATAL	lassie	lassie/main.go:57	invalid p2p multiaddr

@olizilla
Copy link
Contributor Author

olizilla commented May 3, 2023

cid+path works

./lassie fetch --providers /dns4/freeway.dag.haus/tcp/443/https/p2p/bafzbeibhqavlasjc7dvbiopygwncnrtvjd2xmryk5laib7zyjor6kf3avm bafybeihetumsy24mjzbts7t4vpft2rwput44joxxnzxhfh5woq6z46fe2y/johnny-5-is-cowboy.jpg
Fetching bafybeihetumsy24mjzbts7t4vpft2rwput44joxxnzxhfh5woq6z46fe2y/johnny-5-is-cowboy.jpg from [{QmQzqxhK82kAmKvARFZSkUVS6fo9sySaiogAnx5EnZ6ZmC: [/dns4/freeway.dag.haus/tcp/443/https]}]..
Fetched [bafybeihetumsy24mjzbts7t4vpft2rwput44joxxnzxhfh5woq6z46fe2y] from [QmQzqxhK82kAmKvARFZSkUVS6fo9sySaiogAnx5EnZ6ZmC]:
	Duration: 403.931916ms
	  Blocks: 2
	   Bytes: 443 KiB

@olizilla
Copy link
Contributor Author

olizilla commented May 3, 2023

--car-scope file is working: only directory block returned for unixfs dir.

❯ ./lassie fetch --providers /dns4/freeway.dag.haus/tcp/443/https/p2p/bafzbeibhqavlasjc7dvbiopygwncnrtvjd2xmryk5laib7zyjor6kf3avm --car-scope file bafybeihetumsy24mjzbts7t4vpft2rwput44joxxnzxhfh5woq6z46fe2y
Fetching bafybeihetumsy24mjzbts7t4vpft2rwput44joxxnzxhfh5woq6z46fe2y from [{QmQzqxhK82kAmKvARFZSkUVS6fo9sySaiogAnx5EnZ6ZmC: [/dns4/freeway.dag.haus/tcp/443/https]}].
Fetched [bafybeihetumsy24mjzbts7t4vpft2rwput44joxxnzxhfh5woq6z46fe2y] from [QmQzqxhK82kAmKvARFZSkUVS6fo9sySaiogAnx5EnZ6ZmC]:
	Duration: 390.517625ms
	  Blocks: 1
	   Bytes: 681 B

❯ ipfs-car --list-cids bafybeihetumsy24mjzbts7t4vpft2rwput44joxxnzxhfh5woq6z46fe2y.car 
bafybeihetumsy24mjzbts7t4vpft2rwput44joxxnzxhfh5woq6z46fe2y	   

@olizilla
Copy link
Contributor Author

olizilla commented May 3, 2023

How to announce what we have over http to the indexers needs some discussion as we only support dag roots via http today storacha/w3up#786

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants