Description
I've had some thoughts, and here's my braindump. Feedback is welcome.
1. Which use cases should the WASI networking APIs account for?
The official WebAssembly website mentions lots of use cases: https://webassembly.org/docs/use-cases/. I've kept the cases that seem relevant and added some of my own:
- Standalone desktop & server Wasm apps.
- IoT devices running microcontrollers without a full blown OS.
- Web browsers.
- container-based serverless providers like AWS, GCP and Azure.
- JavaScript/Wasm-based serverless providers like Fastly's Compute@Edge, Cloudflare Workers and Akamai EdgeWorkers. These look to be more locked down than their container-based counter parts.
- HTTP proxies.
- plugin/extension-style embeddings into existing applications.
- Audio / video streaming.
- Gaming
- Peer-to-peer applications (games, collaborative editing, decentralized and centralized).
- Image recognition.
- Platform simulation / emulation (ARC, DOSBox, QEMU, MAME, …).
- Language interpreters and virtual machines.
- POSIX user-space environment, allowing porting of existing POSIX applications.
- Remote desktop.
- VPN.
- Local web server.
- Fat client for enterprise applications (e.g. databases).
- Server-side compute of untrusted code.
- Symmetric computations across multiple nodes
2. Which protocols should be focused on first?
For reference, I've compiled a non-exhaustive list of commonly used protocols:
UDP, TCP, DNS, HTTP1/2/3, QUIC, TLS, SSH, SFTP, FTP(S), SMTP, POP3, IMAP, WebRTC, WebSockets, gRPC, UNIX sockets.
Notes on HTTP
Outside the browser
Many runtimes (Java, .NET, ...) have built their own HTTP stack on top of the OS provided socket API's. Even if WASI was to provide an HTTP interface, it seems unlikely existing codebases would switch to it anytime soon.
Inside the browser
Browsers already have an HTTP API: fetch()
. All WASI can provide is a wrapper for that function.
Also, all WASI's current proposals are, as far as I can see, blocking synchronous functions. And browsers won't swallow that. Until there is an general answer for asynchronicity in Wasm/WASI, I propose to leave this use case out of scope for the networking proposal.
My thoughts
TCP and UDP are the lingua franca of the web with pretty much every other protocol in existence built on top of that. It seems logical to start with these, although I'd be happy to be proven wrong.
Pros:
- Wasm engines only need to expose a mininal surface area.
- Gives users the freedom to use any protocol they want.
Cons:
- Browsers will never support it.
- Every module must bring its own implementation for the higher-level protocols it wishes to use. This increases file sizes, but file size is not be a top priority since browser usage is ruled out.
- Very low level. Makes Wasm embedders unable to hand out capabilities based on application protocol-specific traits.
3. What should and shouldn't be allowed?
Prior art regarding blocking unwanted network activity:
- The existing WASI filesystem API. Only file descriptors explicitly passed down from the host can be acted upon.
- Web browsers restrict a script's network access using same/cross origin policies. An "origin" is defined as the combination of protocol+hostname+port. Also, the script only has access to a limited subset of the transmitted HTTP request and response.
- WebExtensions use a manifest.json to declare url patterns which they're allowed to communicate with. Example:
"*://developer.mozilla.org/*"
. - Layer 3/4 firewalls generally allow or block traffic based on direction (inbound, outbound), protocol (TCP or UDP), source (address, port), destination (address, port). This can in theory be applied on any box between the two endpoints of the connection.
- Layer 7 firewalls allow or block traffic based on the data inside the packets, usually HTTP. This allows filtering on things like hostname, method, path, headers, body. This kind of firewall need access to the cleartext data, so it must be placed before the data is encrypted or after the data is decrypted.
Random questions
- How should capabilities be passed down?
- Based on what conditions will embedders decide to allow or deny networking requests?
- Should modules be allowed to instantiate their own network connections or is this a privilege of the host?
- Should networking capability handles be discoverable by modules? Like preopened directories can currently be discovered by iterating through the file descriptor ids until an error is reached.
- Sockets connect to IP addresses only. Example pseudo-code:
connect(fd, "[2a00:1450:400e:80d::200a]", 443)
. However, if a Wasm embedder whishes to block network access based on the host, they'll probably want to do so based on the domain name instead of meaningless IP addresses.
4. What should be the general API design?
Follow the designs of existing API's or come up with something new?
Is it a goal to have one unifying API abstracting away multiple protocols? This is what "TAPS" mentioned in 315 seems to be doing. If there is a time to steer away from the conventional API's, this would be it.
Many existing applications are built on the presumption that the OS doesn't anything higher-level than Berkeley-sockets and therefore already bundle their own implementations for the other protocols one way or another. Whatever the WASI networking API's end up looking like, is it a goal to provide a compatibility layer similar to wasi-libc automagically remapping open()
to openat()
?
What should be the interface boundary? "Networking" is a very broad topic. Assuming embedders will only implement interfaces that are relevant to them, where should the dividing line between interfaces be placed? Some fictional examples for illustration:
wasi_sockets
vswasi_tcp
&wasi_udp
wasi_http
vswasi_http_server
&wasi_http_client
Transparent TLS
I've seen this idea surface multiple times in this repository: unify unencrypted and encrypted connections into a single API.
Some considerations:
- Not all communications are encrypted throughout the entire duration of the connection. Examples: SMTP/IMAP/POP3 using STARTTLS, back-end servers forwarding encrypted data but with the PROXY protocol header prefixed.
- Applications need access to the current state of encryption when deciding whether or not to allow authentication commands.
- Quic has a tight TLS integration.
My thoughts
The WASI overview document mentions:
The first version of WASI is relatively simple, small, and POSIX-like in order to make it easy for implementers to prototype it and port existing code to it, making it a good way to start building momentum and allow us to start getting feedback based on experience.
... hinting at a POSIX-sockets API. This might be an obvious answer since they're well understood and an industry standard.
Random questions:
- As mentioned before, there is no general agreed upon answer to non-blocking functions yet. For listening sockets, does this mean that we're forced to a thread-per-connection solution?
- How to decide which address(es) to bind on? Just assume 127.0.0.1 and/or 0.0.0.0 are available? Or expose an API to list the available network interfaces?
- Allow multicast UDP?
- There is an open pull request to add Berkeley sockets. It features a modified
socket(...)
function that includes a "capability" file descriptor. This is not standard and breaks existing software. Should a compatibility workaround be deviced?
5. Gather feedback
It would be wise to reach out to existing Wasm embedders and get their vision on what the networking API should and shouldn't be allowed to do. For example: Fastly is both a founding member of the bytecode alliance and an embedder. They'll probably have something to say about what networking functionalities they want to allow inside their workers, if any.