Description
WASI currently has two pairs of functions which are similar to each other: sock_send
/sock_recv
and fd_write
/fd_read
. This PR describes a plan for merging them, in favor of fd_write
/fd_read
.
Background
The reason why send
and recv
are separate in POSIX is that they add flags arguments. POSIX says that send and recv are equivalent to write and read when no flags are set.
WASI's sock_send
doesn't currently support any flags. WASI's sock_recv
supports __WASI_SOCK_RECV_PEEK
and __WASI_SOCK_RECV_WAITALL
which correspond to MSG_PEEK
and MSG_WAITALL
in POSIX. Both of these operations conceptually could work on files, however typical operating systems only support them on sockets.
On Linux, there is a subtle difference between recv
and read
: "If a zero-length datagram is pending, read(2) and recv() with a flags argument of zero provide different behavior. In this circumstance, read(2) has no effect (the datagram remains pending), while recv() consumes the pending datagram." It is possible that applications could depend on this subtle difference, but the only reference to it I've been able to find is the git commit which added this line to the man page, which describes a bug where "[...] we would end up in a busy loop when we were using read(2). Changing to recv(2) fixed the issue [...]". The recv
behavior, is what the code in that bug wanted, and is the more intuitive behavior.
The cause of this subtlety is that read
special-cases a 0 return value to mean the end-of-file/stream has been reached. That creates an ambiguity when reading a zero-length datagram.
Proposal
- Remove
sock_send
andsock_recv
. - Add
__wasi_siflags_t
and__wasi_riflags_t
arguments tofd_write
andfd_read
, respectively. - Make
fd_read
return__WASI_EMSGSIZE
when receiving a datagram which is larger than the provided buffer. And remove__WASI_SOCK_RECV_DATA_TRUNCATED
, which is whatsock_recv
used in that case. WASI libc will check for this and to continue to implement the POSIX API (MSG_TRUNC
). - Make
fd_read
return__WASI_EEOS
, a new errno code, when the end-of-file/stream is reached. This eliminates the ambiguity of the special case for 0. WASI libc will check for this and continue to implement the POSIX API with 0 being a special case. - Add rights for
__WASI_RIGHT_FD_READ_PEEK
and__WASI_RIGHT_FD_READ_WAITALL
, which are required to use the__WASI_SOCK_RECV_PEEK
and__WASI_SOCK_RECV_WAITALL
flags, respectively. These rights would not be granted for file-based file descriptors on OS's that don't support these features on files. - Remove the
fs_filetype
field from thefdstat_t
struct. This further hides unnecessary differences between sockets and files.fd_fdstat_get
is an otherwise ambient authority, meaning anyone can do it on any open file descriptor. The file type is still accessible, viafd_filestat_get
, but that requires (__WASI_RIGHT_FD_FILESTAT_GET
). - That happens to leave us with no easy way to implement
isatty
, so add a__WASI_RIGHT_FD_ISATTY
right, to indicate whether a file descriptor is known to be a terminal. This is a little unusual as it's not a typical right, as it's not associated with an operation. However, this right makes it simple to implementisatty
, which is used by libc to do line buffering for stdout when it's on a tty.
And some minor tidying:
- Rename
sock_shutdown
tofd_shutdown
, and make it a file descriptor operation that happens to depend on the__WASI_RIGHT_SOCK_SHUTDOWN
right, which on typical implementations will only get granted for sockets. This is the last remainingsock_*
function. - Rename
__WASI_RIGHT_SOCK_SHUTDOWN
to__WASI_RIGHT_FD_SHUTDOWN
. - Rename
__WASI_SOCK_RECV_PEEK
and__WASI_SOCK_RECV_WAITALL
to sayFD_READ
instead ofSOCK_RECV
.
Miscellaneous notes
The change to make fd_read
return __WASI_EEOS
on end-of-file/stream also fixes an oddity in POSIX in which many applications do an extra read
call after the EOF is encountered, in order to get a 0 return from read
to confirm they've actually reached the end. That said, implementations on POSIX hosts won't be able to report __WASI_EEOS
until they get a 0 from read
themselves, so in practice there will still be an extra read
on such systems.