Skip to content

Commit

Permalink
ipip-445: rename to skip-raw-blocks URL param
Browse files Browse the repository at this point in the history
+ basic editorials
  • Loading branch information
lidel committed Oct 25, 2023
1 parent c1e121e commit 7ba6e2b
Show file tree
Hide file tree
Showing 2 changed files with 132 additions and 59 deletions.
50 changes: 26 additions & 24 deletions src/http-gateways/trustless-gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,28 @@ returned:
returned to the client, the HTTP status code has already been sent to the
client.

### :dfn[skip-raw-blocks] (request query parameter)

The optional `skip-raw-blocks` parameter is available only for CAR requests.

It specifies whether blocks with the multicodec `raw` `0x55` MUST be present in
the CAR response.

It accepts two values:
- `y`: Blocks with `raw` multicodec MUST NOT be returned.
- `n`, or missing (unspecified): no-op, no special handling of `raw` blocks.

When not specified a gateway implementation MUST assume `n`.

:::note Notes for implementers

A `skip-raw-blocks=y` request for a content path with `raw` root CID does not
make sense and SHOULD NOT be sent by clients.

A Gateway SHOULD return HTTP error 400 Bad Request

:::

# HTTP Response

Below MUST be implemented **in addition** to "HTTP Response" of :cite[path-gateway].
Expand Down Expand Up @@ -212,10 +234,10 @@ The Body hash MUST match the Multihash from the requested CID.

# CAR Responses (application/vnd.ipld.car)

A CAR stream for the requested
[application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)
content type (with optional `order`, `dups` and `skip-leaves` params), path and optional
`dag-scope` and `entity-bytes` URL parameters.
A CAR stream ([application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)
with optional `order` and `dups` content type parameters) for the requested
content path (and optional `dag-scope`, `entity-bytes` and/or `skip-raw-blocks`
URL parameters).

## CAR version

Expand Down Expand Up @@ -301,26 +323,6 @@ of their presence in the DAG or the value assigned to the "dups" parameter, as
the raw data is already present in the parent block that links to the identity
CID.

## CAR `skip-leaves` (content type parameter)

The `skip-leaves` parameter specifies whether blocks with the multicodec `raw`
`0x55` must be sent.

It accepts two values:
- `y`: Blocks with `raw` multicodec MUST NOT be sent.
- `n`, or unspecified: Blocks with `raw` multicodec MUST be sent.

A gateway MUST NOT assume this field is `y` if unspecified.
When not specified it always MUST be understood as `n`.

:::note Notes for implementers

A request which is rooted at a `raw` block and has `skip-leaves=y` does not
make sense and SHOULD NOT be sent by clients, it is fair for servers to
error in this situation.

:::

## CAR format parameters and determinism

The default header and block order in a CAR format is not specified by IPLD specifications.
Expand Down
141 changes: 106 additions & 35 deletions src/ipips/ipip-0445.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,20 @@
---
title: "IPIP-0445: trustless gateway skip-leaves option"
title: "IPIP-0445: Option to Skip Raw Blocks in Gateway Responses"
date: 2023-10-09
ipip: open
editors:
- name: Hugo VALTIER
- name: Hugo Valtier
github: Jorropo
url: https://jorropo.net/
affiliation:
name: Protocol Labs
url: https://protocol.ai/
- name: Marcin Rataj
github: lidel
url: https://lidel.org/
affiliation:
name: Protocol Labs
url: https://protocol.ai/
relatedIssues:
- https://github.com/ipfs/specs/issues/444
order: 445
Expand All @@ -17,88 +23,153 @@ tags: ['ipips']

## Summary

Introduce `skip-leaves` flag for the :cite[trustless-gateway].
Introduce `skip-raw-blocks` flag for the :cite[trustless-gateway].

## Motivation

Allow clients to read a stream which only contain proofs in a bottom heavy
graph using `raw` codec for it's leaves.

Usefull with unixfs for features like webseeds [#444](https://github.com/ipfs/specs/issues/444).
Usefull for UnixFS for features like webseeds
([ipfs/specs#444](https://github.com/ipfs/specs/issues/444)), where metadata
about a DAG is fetched from a trustless gateway, but the actual raw data can be
fetched from any source that supports either trustless gateway specification,
or plain HTTP Range Requests, allowing for trustless and verifiable data
retrieval from plain HTTP (non-IPFS) data sources.

## Detailed design

The `skip-leaves` CAR Content-Type parameter on :cite[trustless-gateway]
The `skip-raw-blocks` URL query parameter on :cite[trustless-gateway]
allows clients to download an entity except blocks with the multicodec
`raw` (`0x55`).

- When set to `y`, the parameter instructs the gateway not to transmit
blocks tagged with the `raw` multicodec.
- If set to `n`, or left unspecified, the gateway MUST transmit `raw`
multicodec blocks.
blocks referenced with a CID with the `raw` multicodec.
- If set to `n`, or left unspecified, there is no special handling of `raw`
multicodec blocks (the existing default behavior remains the same).

Importantly, unless explicitly specified as `y`, the default operational
mode of the gateway MUST assume the value of `skip-leaves` to be `n`.
mode of the gateway MUST assume the value of `skip-raw-blocks` to be `n`.

## Design rationale

### User Benefit

Implementing the `skip-leaves` parameter offers several benefits to users:
Implementing the `skip-raw-blocks` parameter offers several benefits to users:

1. **Verification Flexibility:** Clients can verify out-of-band (OOB) received
files in their deserialized form without necessitating the transmission of
raw blocks from the gateway.

2. **Incremental Download:** Clients can incrementally download files in
deserialized forms from non-IPFS servers. Allowing applications to share
distribution for IPFS and non IPFS clients.
3. **Efficient Block Discovery:** With the `skip-leaves` option enabled,
distribution for IPFS and non-IPFS clients.

3. **Efficient Block Discovery:** With the `skip-raw-blocks` option enabled,
clients can quickly discover numerous candidate blocks without being
bottlenecked by the gateway's transmission of raw blocks.

4. **Non-IPFS HTTP Mirrors Become Useful:** Legacy data that is already exposed
over HTTP in deserialized form can now act as sources for specific block
byte ranges, without having to support any IPFS specific APIs. Plain HTTP
Range Requests can be used for fetching remaining raw block data, and the
metadata read via `skip-raw-blocks=y` is enough for a client to verify the
remaining raw block byte ranges fetched from non-IPFS system match expected
CIDs.

### Compatibility

Setting the default value of the `skip-leaves` parameter to `n` ensures
Setting the default value of the `skip-raw-blocks` parameter to `n` ensures
backward compatibility with existing clients and systems that are unaware
of this new flag.

### Prevention of Amplification Attacks and Efficient Server Operation
### Alternatives

By utilizing the `raw` (`0x55`) codec servers can trivially determine whether
to fetch or skip a block without having to learn any new information.
Although more limited and not able to handle unixfs file using dag-pb for their
leaves, it allows both the client and server to trivially verify a block
must not be fetched. Preventing issues of Amplification where a server could
need to fetch multiple orders more data than the client when executing the
request.
An alternative approach would be to request blocks individually.
However, it adds extra round trips and more per HTTP request overhead
and thus is undesirable.

### Why not `dag-scope=skip-leaves` ?
#### Why not `dag-scope=skip-raw-blocks` ?

The `dag-scope` parameter determines the overall range of blocks to retrieve,
while `skip-leaves` selectively filters specific blocks within that range.
The existing `dag-scope` parameter determines the overall range of blocks to retrieve,
while `skip-raw-blocks` selectively filters specific blocks across all scopes and ranges.
Combining them under one parameter would restrict their combined utility.

For example:
- A client is streaming a video from a webseed and the user seeked through the
- A client is streaming a video from a webseed and the user seeks through the
video, then the client would send `dag-scope=entity&entity-bytes=42:1337`
with `skip-leaves=y` to download the proofs for the required section of the
video.
- A client is verifying an OOB transfered directory in deserialized form,
then `dag-scope=all` with `skip-leaves=y` makes sense.
with `skip-raw-blocks=y` to download the proofs for the required section of the
video, and then fetches remaining raw data byte ranges from a faster CDN.
- A client is verifying an OOB transferred directory in deserialized form,
then `dag-scope=all` with `skip-raw-blocks=y` makes sense.

### Alternatives
#### Why not CAR content type parameter ?

CAR content type's
([application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car))
optional parameters like `order` and `dups` impact the way data is represented
when returned as a CAR stream, but does modify the scope of the data itself.
Does not add nor subtract data from the response.

The scope of the data is controlled by URL content path and optional
`dag-scope`, `entity-bytes` URL parameters. This is where `skip-raw-blocks`
belongs.

This is not just a matter of aesthetics: the URL path and query parameters
allow for caching of different subsets of a DAG in a way that is interoperable
with existing HTTP tools and clients, minimizes risk of caching incomplete DAG
response due to HTTP cache misconfiguration. Thanks to `skip-raw-blocks` being
in the URL query, we ensure CAR responses without `raw` blocks will be cached
under different key than full responses (just like already existing `dag-scope`
and `entity-bytes`).

### Why not generic `skip-leaves` that skips all leaves, not just `raw` blocks?

Prevention of amplification attacks and efficient server operation.

By utilizing the `raw` (`0x55`) codec servers can trivially determine whether
to fetch or skip a block without having to fetch it to learn any new
information.

If we framed this feature around skipping all leaf nodes, that would require
server to fetch the leaves to learn if they have any child nodes. This would
force server to fetch data that is never returned to the client.

Although `skip-raw-blocks` is more limited and not able to handle UnixFS files
chunked without `--raw-leaves` option, it allows both the client and server to
trivially verify a block must not be fetched. Preventing issues of
Amplification where a server could need to fetch multiple orders more data than
the client when executing the request.

An alternative approach would be to request blocks individually.
However it adds extra round trips and more per HTTP request overhead
and thus is undesireable.

## Security

None.
This IPIP does not impact security model of trustless gateway.

## Test fixtures

TODO
:::issue

TODO: update below section with CIDs or CARs from conformance tests

Scenarios we should check:
- [ ] reuse existing UnixFS DAG that has raw-leaves, request it with
`skip-raw-blocks=n`, confirm the response includes expected raw leaves' CIDs
- [ ] create a new CAR fixture that only have non-raw blocks. Request it with
`skip-raw-blocks=y`, confirm the response includes expected CIDs and does not
include raw blocks referenced by parents.
- important part is creating CAR fixture by hand, and ensure the raw blocks are
NEVER announced anywhere (generate fixture with random data, add to ipfs
with raw-leaves option, then export DAG without `raw` blocks (use go-car's
[`filter`](https://github.com/ipld/go-car/tree/master/cmd/car#readme) or
similar)
- Why? This goes extra mile, but ensures every conformant gateway
implementation is not doing useless work of fetching raw blocks which are
not required for fulfilling `skip-raw-blocks=y` requests). We did
similar thing for `entity-bytes` and it was the only way we could show
bugs in Saturn project's cache implementation at the time.

:::

### Copyright

Expand Down

0 comments on commit 7ba6e2b

Please sign in to comment.