Skip to content

[v0.3] model bodies after stream<u8, trailers, error-code>, trap less #162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 11, 2025

Conversation

rvolosatovs
Copy link
Contributor

@rvolosatovs rvolosatovs commented Mar 13, 2025

  • Make bodies non-optional for request and response constructors
  • Take a future<result<option<trailers>, error-code>> parameter in request/response constructors (refs Defer error-context to after 0.3.0 component-model#474 (comment))
  • Report any transmit errors occurring in the future returned by request/response constructor
  • Allow headers resources acquired by calling request.headers or response.headers to persist after respective into-parts is called, referencing the immutable headers of the original request/response. This implies that the headers acquired this way are not, if fact, children of the original request/response they originated from and the original request/response can be freely dropped and/or moved even before they are dropped.
  • Do not trap if guest attempts to acquire multiple handles to the body stream - handle that using type system
  • Model bodies after future stream<u8, trailers, error-code>
  • Remove into-parts
  • Add request-options#clone

Closes #156
Closes #161
Closes #164
Closes #163
Closes #165

Signed-off-by: Roman Volosatovs <rvolosatovs@riseup.net>
Signed-off-by: Roman Volosatovs <rvolosatovs@riseup.net>
Copy link
Member

@lukewagner lukewagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up and thinking about lifetimes. Overall, this makes sense as part of the general goal (that we discussed in the earlier BA C-M impl meeting) of, in cases where a child would otherwise outlive its parent, having lazier tombstone/failure semantics.

Comment on lines 381 to 384
/// The headers returned by this function are considered to be a clone of the headers
/// of the request. Changes to the `headers` returned by this function will not be reflected
/// in the immutable `headers` resources, that could have been acquired by calls to `headers`
/// prior to this function being called.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is what we discussed, and it is the friendliest option but I remembered later that, when we discussed this as a group in the BA C-M impl call, the idea was to instead say: if you call into-parts when there is already an existing handle to headers, that existing headers handle is put into a state where all header operations fail (how precisely, to be defined per-method). This is symmetric to what you're describing with body and it matches more-closely how a built-in WIT child use<headers> return type would work (bindgen wouldn't know to clone). WDYT?

(and similarly in response)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we'd need to wrap fields#get, fields#has and fields#entries results in a result<T, header-error>, as well as introduce a header-error#moved or header-error#corrupted to fully cover this case.

  • /// Get all of the values corresponding to a name. If the name is not present
    /// in this `fields`, an empty list is returned. However, if the name is
    /// present but empty, this is represented by a list with one or more
    /// empty field-values present.
    get: func(name: field-name) -> list<field-value>;
    /// Returns `true` when the name is present in this `fields`. If the name is
    /// syntactically invalid, `false` is returned.
    has: func(name: field-name) -> bool;
  • /// Retrieve the full set of names and values in the Fields. Like the
    /// constructor, the list represents each name-value pair.
    ///
    /// The outer list represents each name-value pair in the Fields. Names
    /// which have multiple values are represented by multiple entries in this
    /// list with the same name.
    ///
    /// The names and values are always returned in the original casing and in
    /// the order in which they will be serialized for transport.
    entries: func() -> list<tuple<field-name,field-value>>;

Otherwise, I'm afraid this behavior would inevitably cause bugs in applications using this API.

I feel like the behavior proposed in this PR is less surprising, provide better developer experience and introduce no overhead. That said I'm happy to add a new header-error and wrap the methods with a result, if you think we should do that. Otherwise, perhaps we should discuss this in next BA C-M impl call?

Signed-off-by: Roman Volosatovs <rvolosatovs@riseup.net>
@tschneidereit
Copy link
Member

  • Auto-close any existing, active body resources on repeated calls to request.body or response.body

  • Auto-close any existing, active body streams/futures on repeated calls to body.stream, drop of the body resource they originated from or calls to body.finish

I fear that this will result in weird spooky-action-at-a-distance situations, and very hard-to-debug issues in code that doesn't expect its handles to be closed behind its back.

Would an alternative be for all of these calls to return results, with an error being returned if the child resource has already been acquired and not closed? That would be similar to what web streams do, for example: if, say, you call ReadableStream#getReader(), the stream gets locked, and trying to call getReader() again will throw an exception. If you close the reader, you can't use it anymore, but can acquire a new one, because the stream is unlocked.

Note that the only change I'm proposing is to not auto-close existing resources when calling, e.g., request.body(). All operations on existing child-resource handles should still have the tombstone semantics, because I agree that auto-closing child handles on moving/closing the parent is much better than trapping.

@rvolosatovs
Copy link
Contributor Author

  • Auto-close any existing, active body resources on repeated calls to request.body or response.body
  • Auto-close any existing, active body streams/futures on repeated calls to body.stream, drop of the body resource they originated from or calls to body.finish

I fear that this will result in weird spooky-action-at-a-distance situations, and very hard-to-debug issues in code that doesn't expect its handles to be closed behind its back.

Would an alternative be for all of these calls to return results, with an error being returned if the child resource has already been acquired and not closed? That would be similar to what web streams do, for example: if, say, you call ReadableStream#getReader(), the stream gets locked, and trying to call getReader() again will throw an exception. If you close the reader, you can't use it anymore, but can acquire a new one, because the stream is unlocked.

Note that the only change I'm proposing is to not auto-close existing resources when calling, e.g., request.body(). All operations on existing child-resource handles should still have the tombstone semantics, because I agree that auto-closing child handles on moving/closing the parent is much better than trapping.

IIRC, the reason we went with auto-close was the body accessor on request/response, since only one of them should be able to exist at a time. We should be able to make body() return result<option<body>>, where the outer result is err if body is called before previous one has been dropped. The inner option is none if the body has been finish'ed.
We could then "resultify" body.finish and body.stream.
This gets a bit tricky in the host, because we don't have an explicit drop on stream handles, but there are ways to work around that with more involved state tracking.

That actually seems to align with the idea of body resource being replaced by stream<u8, trailers, error-code>. It seems the most natural for body method to return result<option<stream<u8, trailers, error-code>>> in that scenario.

Note that I'm opting for result<option<body>> rather than result<body, body-error>, since getting a "consumed body" is not semantically an error, there's simply no more data left, I believe this would also simplify usage in guest languages that don't have rich enum types like e.g. Rust

@lukewagner WDYT?

Signed-off-by: Roman Volosatovs <rvolosatovs@riseup.net>
@tschneidereit
Copy link
Member

This seems much better to me, thank you! In particular into-parts is now much less of a footgun I think: I could very easily see developers using body() first, then later deciding they want to get a clone of the headers and using into-parts() for that, simply ignoring the body. This pattern won't work any better with this API, but at least it'll fail loudly and immediately, instead of quietly invalidating the other body handle.

@lukewagner
Copy link
Member

I suppose that's another valid option to consider and I do like the "locality" of the approach. But I worry that, because GC languages usually lack prompt finalization, an "unreachable but not yet finalized" Body object will cause this method to fail and the developer will have no recourse but to somehow find a place where I can call an explicit .dispose() method on it before it became unreachable, which sounds like perhaps a worse DX overall.

Taking a step back: if a single component instance calls request.body twice in a row, it would be fine if the handle returned by the first call is left working and aliasing the same underlying body as the handle returned by the second call; just like we do with headers and just like the browser's JS fetch() API does. The important thing to prevent unintended interference is that, when the parent request/response is transferred to another component or the host, there are no dangling child body handles into it, which is what plain "child" handle semantics gives you. Thus we could relax the rules on the body method to be just like headers and avoid both the action-at-a-distance and need-to-explicit-dispose problems.

The only remaining problem is that into-parts looks, from a WIT perspective, like "transferring ownership out" and so it "spuriously" triggers the child rules. Given that one can always pipe a stream from one request into a new body for a new request or response, it doesn't seem fundamentally necessary to extract the whole body of a request/response (and indeed, with stream<u8, trailers, http-error>, body may go away entirely). Given the special immutability rules for headers though, it is necessary to be able to extract the headers. So what if we nixed into-parts and instead added a take-headers: func() -> headers (and take-response-options too)?

@rvolosatovs
Copy link
Contributor Author

If two body resources referencing the same request/response could exist at the same time, what would second finish do? E.g.:

let Some(b1) = req.body();
let Some(b2) = req.body();
let Ok(fut) = b1.finish().await;
let Ok(None) = fut.await;
let x = b2.finish().await; // is this an error?
// If x is an error, how does it differ from
// the case where a reference to the stream exists?

Another thing to note is that we'd need to make headers method return option<headers> and after take-headers is called, headers would return none. Similarly, take-headers itself would have to return option<headers>

@tschneidereit
Copy link
Member

I think I agree that we can change things to just return handles representing the same underlying resource, yes. Even better would be if the same handle were returned, but I remember that there are issues with implementing that efficiently.

Regarding into-parts: do we really need a replacement for it? We already have fields#clone(), and doing request.headers().clone() doesn't seem all that onerous to me.

@lukewagner
Copy link
Member

If two body resources referencing the same request/response could exist at the same time, what would second finish do?

It'd do the same thing as if you called finish twice on the same handle (b/c it's the same underlying resource; like how headers and mutation works today).

Even better would be if the same handle were returned, but I remember that there are issues with implementing that efficiently.

Sigh; agreed. But yeah, some very thorny (perf and leak) issues if we try to provide handle identity.

Regarding into-parts: do we really need a replacement for it? We already have fields#clone(), and doing request.headers().clone() doesn't seem all that onerous to me.

Good point! That would address Roman's question regarding headers needing to return an option too. Hypothetically take-headers could be cheaper than COW clone on some impls, but I think it's worth starting without it and seeing if anyone complains.

Signed-off-by: Roman Volosatovs <rvolosatovs@riseup.net>
Signed-off-by: Roman Volosatovs <rvolosatovs@riseup.net>
@rvolosatovs rvolosatovs force-pushed the feat/p3-friendly-api branch from f4f168e to eac8416 Compare March 20, 2025 13:03
@rvolosatovs
Copy link
Contributor Author

  • added request-options#clone
  • dropped into-parts
  • made body() method return option<body> yet again
  • documented the fact that duplicate calls to body.finish/body.stream across instances will cause an error

I think that should address all the feedback, @tschneidereit @lukewagner PTAL

Signed-off-by: Roman Volosatovs <rvolosatovs@riseup.net>
@rvolosatovs rvolosatovs force-pushed the feat/p3-friendly-api branch 2 times, most recently from 1be0fcd to 07e6cbb Compare March 21, 2025 15:21
@rvolosatovs rvolosatovs changed the title feat(p3): be less trappy feat(p3): be less trappy, model bodies after stream<u8, trailers, error-code>> Mar 21, 2025
@rvolosatovs rvolosatovs changed the title feat(p3): be less trappy, model bodies after stream<u8, trailers, error-code>> feat(p3): model bodies after stream<u8, trailers, error-code>, trap less Mar 21, 2025
@rvolosatovs rvolosatovs changed the title feat(p3): model bodies after stream<u8, trailers, error-code>, trap less [v0.3] model bodies after stream<u8, trailers, error-code>, trap less Mar 21, 2025
Signed-off-by: Roman Volosatovs <rvolosatovs@riseup.net>
@rvolosatovs rvolosatovs force-pushed the feat/p3-friendly-api branch from 07e6cbb to 865daa5 Compare March 21, 2025 15:32
@rvolosatovs rvolosatovs requested a review from lukewagner March 21, 2025 15:32
Copy link
Member

@tschneidereit tschneidereit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like this much better! I left a couple of comments inline that seem relevant, though.

headers: headers,
body: option<body>,
contents: stream<u8>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be optional, since not all requests have bodies, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we discussed with @lukewagner and @dicej that body resource would be non-optional (#165), that's something that would have addressed #164.

In that scenario requests not carrying a body (like GET) would be represented by empty streams.

Now that there's no body resource anymore and it's the request/response constructors that return the transmit future, we can reconsider this, indeed, especially since the host would not be able to do anything with the trailers future in that case anyway.

body parameter will have to become an option<tuple<..>> though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation; makes sense to me

/// - a stream returned by a previous call to this method has reported itself as closed
/// Thus there will always be at most one readable stream open for a given body.
/// Each subsequent stream picks up where the last stream left off, up until it is finished.
body: func() -> option<tuple<stream<u8>, future<result<option<trailers>, error-code>>>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit unsure whether using option for this makes sense: it seems overloaded for request that just don't have a body, and lead to confusing silent failures.

Given that, ISTM the better, if slightly more onerous, signature would be

body: func() -> option<result<tuple<stream<u8>, future<result<option<trailers>, error-code>>>, body-in-use-error>>;

(Where body-in-use-error would be a new thing, where I don't have strong feelings about what it should look like, exactly.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're making bodies optional again, that seems to follow, yes.
I'd prefer not adding the body-in-use-error just yet though and keep it at result<tuple<stream<u8>, future<result<option<trailers>, error-code>>>> for now, verify the design with an implementation and potentially revisit later (as that seems to be a minor change)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be missing it, but I couldn't find any prohibition against trailers in GET requests, so if there were an option wrapping the stream<u8>, I think it'd need to be inside the tuple. However, I haven't been able to find a semantic distinction between none and "zero-length contents" inside RFC 9110, so I worry that by introducing this distinction, we might be creating something that content may incorrectly depend on. (E.g., let's say a component decides to fail if the option<stream<u8>> is non-none, but now it's rejecting some zero-length stream which maybe it gets passed (in a multi-component composition).).

Considering the perf angle (which could be another reason for the option):

  • As an optimization and convenience, I do think it'd make sense to have the contents parameter to {request,response}.new be option<stream<u8>> as Till suggested, but none could be spec-defined to be normalized to an empty stream internally.
  • I expect guest code simply won't ask for the content stream for GET requests, so source-language stream objects won't be wastefully created. The CABI overhead of creating closed-on-construction streams should also be light, I think.

So perhaps:

body: func() -> result<tuple<stream<u8>, future<result<option<trailers>, error-code>>>>;

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch: I hadn't considered that my proposal would prevent access to trailers as well.

With that, I agree with the proposed interface, with one slight exception: I do think this is a categorically different error from the ones represented by error-code, and that we should make it its own type. Is this perhaps even a pattern that'd make sense to lift up higher and use the same error type for in other places as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As currently proposed, the error type in this case is actually unit, e.g. in Rust it's ().
Here's the host implementation with the full Rust type: https://github.com/bytecodealliance/wasip3-prototyping/blob/ecd781215e650dc5eb4e6ec4b2fd60858c084654/crates/wasi-http/src/p3/host/types.rs#L669-L700

Does this address the concern?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda? It's better than using error-code (sorry for missing that it's not), but it does seem very low-effort to define a body-in-use-error type and return that instead. And would make this much more self-documenting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen us use that idiom before in WASI of having singleton error variants; there's usually just a result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, you're absolutely right: I had confused myself into thinking that if we move away from error-code, we need a specific replacement for it. But since the result here is only going to be err if the body is already taken out, I agree that that's not needed. Sorry for the noise.

headers: headers,
body: option<body>,
);
contents: stream<u8>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for request

/// - a stream returned by a previous call to this method has reported itself as closed
/// Thus there will always be at most one readable stream open for a given body.
/// Each subsequent stream picks up where the last stream left off, up until it is finished.
body: func() -> option<tuple<stream<u8>, future<result<option<trailers>, error-code>>>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for request

Signed-off-by: Roman Volosatovs <rvolosatovs@riseup.net>
Copy link
Member

@lukewagner lukewagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, looks great with only requested remaining change to put option on the content stream in the new functions. Thanks for all the hard work implementing and thinking through all the important error and composition cases @rvolosatovs!

Signed-off-by: Roman Volosatovs <rvolosatovs@riseup.net>
@rvolosatovs
Copy link
Contributor Author

@lukewagner I've just pushed ae89575

Signed-off-by: Roman Volosatovs <rvolosatovs@riseup.net>
Copy link
Member

@tschneidereit tschneidereit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me! The one open question (which I unfortunately didn't have earlier, my apologies!) is around the trailers parameter to the request/response constructors—see inline comments.

headers: headers,
body: option<body>,
contents: option<stream<u8>>,
trailers: future<result<option<trailers>, error-code>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thought on this: are there downsides to always having to pass a future here? I'm not sure what the overhead (both resource- and boilerplate-wise) of creating an managing a future is. If it's trivial, then this doesn't much matter, but otherwise, should we consider wrapping this into an option<>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the "stream<T, U, V> world", if we go with optional body in constructors, the parameter would be option<stream<u8, option<trailers>, error-code>>, meaning that users could either pass no body stream or pass a body stream, with potentially empty contents, but mandatory final element (option<trailers> or error-code).
From what I understand, there would not be a way to optionally pass a final stream element.

With that in mind, if we want to make the future optional, we'd need to take an option<tuple<stream<u8>, future<result<option<trailers>, error-code>>>>, which would then directly translate to option<stream<u8, option<trailers>, error-code>> in the future. (see also #162 (comment))

FWIW, here's what it takes to create a new future in Rust guest and pass to a constructor: https://github.com/bytecodealliance/wasip3-prototyping/blob/ad4ccf03172b22320c4a22971c51b4f7249a2846/crates/test-programs/src/p3/http.rs#L78-L81
and then send a value: https://github.com/bytecodealliance/wasip3-prototyping/blob/ad4ccf03172b22320c4a22971c51b4f7249a2846/crates/test-programs/src/p3/http.rs#L112
So 2 LoC, plus an import of futures::SinkExt to expose futures::SinkExt::send

I would not expect creating a future handle to be an expensive operation, especially considering the fact that we're most likely dealing with network I/O here.

I'm happy to switch to a tuple here, if that's what everyone prefers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the overhead shouldn't be too significant. Also, it encourages the generally good engineering practice of explicitly signalling the final success or failure of the request/response in the result.

/// - a stream returned by a previous call to this method has reported itself as closed
/// Thus there will always be at most one readable stream open for a given body.
/// Each subsequent stream picks up where the last stream left off, up until it is finished.
body: func() -> result<tuple<stream<u8>, future<result<option<trailers>, error-code>>>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the answer to my question above, this might also have to change? Though even if we make the trailers ctor parameter optional, I think we could leave this as-is, by constructing a future internally if none was passed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not introduce a distinction here using the same reasoning as laid out by @lukewagner in #162 (comment)

If the future was optional, users of this API would have no way of knowing if request/response receipt was successful if the future was none.

E.g. if a component read one byte from stream<u8>, after which it was closed and future was none - was the underlying I/O successful or was there a failure?

The future resolving to a success means that the incoming request/response has been received it it's entirety. It seems that if it were optional, none would be only possible in very niche use cases, e.g. if the resource was constructed by the guest and then requested by itself via body.

I would rather avoid having to document/specify this behavior making this interface more self-documenting

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the explanation and references: that all makes sense to me!

@rvolosatovs
Copy link
Contributor Author

Little update:
wasi:http@0.3.0-draft as proposed in this PR is implemented in bytecodealliance/wasip3-prototyping#58, with a working round-trip demo available at https://github.com/rvolosatovs/p3-http-test

Copy link
Member

@tschneidereit tschneidereit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turned out great!

@lukewagner
Copy link
Member

Agreed; great work @rvolosatovs!

@lukewagner lukewagner merged commit d163277 into WebAssembly:main Apr 11, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants