Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPIP-0445: Option to Skip Raw Blocks in Gateway Responses #445

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 21 additions & 1 deletion src/http-gateways/trustless-gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,7 @@ The Body hash MUST match the Multihash from the requested CID.

A CAR stream for the requested
[application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)
content type (with optional `order` and `dups` params), path and optional
content type (with optional `order`, `dups` and `skip-leaves` params), path and optional
lidel marked this conversation as resolved.
Show resolved Hide resolved
`dag-scope` and `entity-bytes` URL parameters.

## CAR version
Expand Down Expand Up @@ -301,6 +301,26 @@ of their presence in the DAG or the value assigned to the "dups" parameter, as
the raw data is already present in the parent block that links to the identity
CID.

## CAR `skip-leaves` (content type parameter)

The `skip-leaves` parameter specifies whether blocks with the multicodec `raw`
`0x55` must be sent.

It accepts two values:
- `y`: Blocks with `raw` multicodec MUST NOT be sent.
- `n`, or unspecified: Blocks with `raw` multicodec MUST be sent.

A gateway MUST NOT assume this field is `y` if unspecified.
When not specified it always MUST be understood as `n`.

:::note Notes for implementers

A request which is rooted at a `raw` block and has `skip-leaves=y` does not
make sense and SHOULD NOT be sent by clients, it is fair for servers to
error in this situation.

:::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is to reduce guesswork and probing a client has to do, which means this behavior should not be left for implementers, but a clear MUST / MUST NOT in the spec, including the HTTP error code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something 4XX something.
My lazy side tell me that I don't want to do this because then I save on code and I can return an empty car in boxo gateway instead of implementing one more edge case.
I can do it if it feels important to someone but given sending a 200 and returning an empty car is fine outcome I'm not sure we need to do this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not OK with empty CARs for the error case, and I don't particularly like that boxo/gateway is already doing this for other error cases and would rather not embed it as a standard practice here. 415 please.

Copy link
Contributor Author

@Jorropo Jorropo Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's different from the other cases in boxo/gateway. If you request a raw block with skip-leaves, an empty car is a correct answer. It is the complete dag minus all the raw blocks, which happen to be an empty set but that is the client's problem and there is nothing the gateway can do about it. This is not a transient error either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this section in f96a92a and made HTTP 400 (Bad Request) a MUST behavior that we will also enforce via conformance test.

(error is not 415 as we moved skip-raw-blocks from content type to URL query params – see "Alternatives" / "Why not CAR content type parameter ?" section in IPIP).


## CAR format parameters and determinism

The default header and block order in a CAR format is not specified by IPLD specifications.
Expand Down
105 changes: 105 additions & 0 deletions src/ipips/ipip-0445.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
---
title: "IPIP-0445: trustless gateway skip-leaves option"
date: 2023-10-09
ipip: open
editors:
- name: Hugo VALTIER
github: Jorropo
url: https://jorropo.net/
affiliation:
name: Protocol Labs
url: https://protocol.ai/
relatedIssues:
- https://github.com/ipfs/specs/issues/444
order: 445
tags: ['ipips']
---

## Summary

Introduce `skip-leaves` flag for the :cite[trustless-gateway].
Jorropo marked this conversation as resolved.
Show resolved Hide resolved

## Motivation

Allow clients to read a stream which only contain proofs in a bottom heavy
graph using `raw` codec for it's leaves.

Usefull with unixfs for features like webseeds [#444](https://github.com/ipfs/specs/issues/444).

## Detailed design

The `skip-leaves` CAR Content-Type parameter on :cite[trustless-gateway]
allows clients to download an entity except blocks with the multicodec
`raw` (`0x55`).

- When set to `y`, the parameter instructs the gateway not to transmit
blocks tagged with the `raw` multicodec.
- If set to `n`, or left unspecified, the gateway MUST transmit `raw`
multicodec blocks.

Importantly, unless explicitly specified as `y`, the default operational
mode of the gateway MUST assume the value of `skip-leaves` to be `n`.

## Design rationale

### User Benefit

Implementing the `skip-leaves` parameter offers several benefits to users:

1. **Verification Flexibility:** Clients can verify out-of-band (OOB) received
files in their deserialized form without necessitating the transmission of
raw blocks from the gateway.
2. **Incremental Download:** Clients can incrementally download files in
deserialized forms from non-IPFS servers. Allowing applications to share
distribution for IPFS and non IPFS clients.
3. **Efficient Block Discovery:** With the `skip-leaves` option enabled,
clients can quickly discover numerous candidate blocks without being
bottlenecked by the gateway's transmission of raw blocks.

### Compatibility

Setting the default value of the `skip-leaves` parameter to `n` ensures
backward compatibility with existing clients and systems that are unaware
of this new flag.

### Prevention of Amplification Attacks and Efficient Server Operation

By utilizing the `raw` (`0x55`) codec servers can trivially determine whether
to fetch or skip a block without having to learn any new information.
Although more limited and not able to handle unixfs file using dag-pb for their
leaves, it allows both the client and server to trivially verify a block
must not be fetched. Preventing issues of Amplification where a server could
need to fetch multiple orders more data than the client when executing the
request.

### Why not `dag-scope=skip-leaves` ?

The `dag-scope` parameter determines the overall range of blocks to retrieve,
while `skip-leaves` selectively filters specific blocks within that range.
Combining them under one parameter would restrict their combined utility.

For example:
- A client is streaming a video from a webseed and the user seeked through the
video, then the client would send `dag-scope=entity&entity-bytes=42:1337`
with `skip-leaves=y` to download the proofs for the required section of the
video.
- A client is verifying an OOB transfered directory in deserialized form,
then `dag-scope=all` with `skip-leaves=y` makes sense.

### Alternatives

An alternative approach would be to request blocks individually.
However it adds extra round trips and more per HTTP request overhead
and thus is undesireable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
An alternative approach would be to request blocks individually.
However it adds extra round trips and more per HTTP request overhead
and thus is undesireable.
1. An alternative approach would be to request blocks individually.
However it adds extra round trips and more per HTTP request overhead
and thus is undesireable.
2. An alternative implementation may be to either specify codecs to
include, or codecs to exclude. e.g. `exclude-codec=0x55` or
`include-codec=0x70` to achieve similar results. There may be
broader utility in such an approach, but the use-cases beyond that
proposed above for `skip-leaves` are not clear.

🤷 this is an option, is it better? not sure, I'm not even sure it'd be very different on the implementation level and may afford some other possibilities. It also clears up the "leaves" terminology, which is very unixfs+CIDv1 specific.

Copy link
Contributor Author

@Jorropo Jorropo Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intresting idea. I don't think the current API is unixfs specific. I hope raw blocks are children-less bytes in all formats that use them.
I guess you could have an ADL that parse the bytes node out of the file using cbor but then you are using IPLD wrong IMO, should really be a cbor cid then.

exclude-codec and include-codec are interesting however they complexify the server implementation.
I have an idea on how we could combine a small index mapping offsets into the unixfs file into offset into a car as well as offsets from childrens to offset into the car and a CARv1 DFS body with offloading and zerocopy to make a 400Gbps gateway with extremely low CPU usage implementation and having user input surface complexify the index this situation because I don't just need to map ranges in the unixfs file to ranges in the car, I also need to mark and compute prefix sums for all the codecs in the car.
So given this might anoy me one day and I don't need it, I wouldn't do it unless someone else wants this feature right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also one weird thing, if I have a dag that is dag-json → dag-cbor → raw and I exclude the dag-cbor codec, is the gateway expected to send me 1 or 2 blocks ? Because if the answer is 2 blocks then the gateway has to process every skipped blocks which means the gateway does work which does not reflect on the client. And if the answer is 1 then it does not follow least surprise principle.

Copy link
Member

@lidel lidel Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(bit unprocessed thought) If we are framing this around DAG traversal, exclude-codec=0x55 could be a sensible attempt at future-proofing, but we can always add it later.

Rationale:

The raw codec provides an interoperable way to act as traversal stop: ensure the child block that was referred will not be used for any further traversal / deserialization, but returned as an opaque blob. Allows for creating non-UnixFS dags that are compatible with existing IPFS ecosystem.

If we add generic exclude-codec and its semantic meaning also control when the traversal stops, then there could be a real world utility for excluding DAG branches based on their root codec, e.g. when they can't be retrieved with usual transports and require special handling. Enabling partial retrieval of data that is partially encoded in proprietary codecs sounds useful, but now sure how real world need there is for this right now. People building modern things seem to be happy with things put on top of UnixFS or DAG-CBOR.

Only thing that comes to mind is Filecoin which iirc can't be pinned or fetched recursively with standard IPFS tools due to the use of unsupported codecs (cc @hsanjuan @aschmahmann if i misremembered this, I think we've hit that problem a while ago).

Is there any other use case that could benefit from open-ended exclude-codec?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, silence suggests we don't have any immediate need for exclude-codec.
Let's keep this IPIP limited to skip-raw-blocks, open-ended filtering can be proposed in future IPIP.

## Security

None.

## Test fixtures

TODO

### Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).