IO devices do not convert to binaries on get_until with eof #4992
Description
Describe the bug
The Erlang specification says that, if a get_until
MFA returns a list, the list must be converted to a binary if the IO device is in binary mode. However, this does not happen if get_until
is read until EOF.
To Reproduce
Compile this module:
-module(demo).
-export([until_eof/2, start/0]).
until_eof([],eof) -> {done,eof,[]};
until_eof(ThisFar,eof) -> {done,ThisFar,[]};
until_eof(ThisFar,CharList) -> {more,ThisFar++CharList}.
start() ->
IoServer = group_leader(),
io:setopts(IoServer, [{binary,true}]),
IoServer ! {io_request,
self(),
IoServer,
{get_until, unicode, '', ?MODULE, until_eof, []}},
receive
{io_reply, IoServer, Data} ->
erlang:display(Data),
erlang:halt(0)
end.
And then:
$ erl -noshell -s demo
dsadsadsa
dsadsadsa
"dsadsadsa\ndsadsadsa\n"
Notice the result is not a binary. However, if you change the code to return before eof
(i.e. on the first newline), then a binary is returned. The reason why this happens is because io_lib:get_until/4
, which is used by all (?) devices, converts the result to binaries only if the data is a binary, but in this case, the data is the atom eof
:
https://github.com/erlang/otp/blob/master/lib/stdlib/src/io_lib.erl#L966
Expected behavior
The expected behaviour is for get_until
to return a binary if the device is set to binary and I read until eof
.
One possible solution would be to pass the format=list | binary
as an argument to io_lib:get_until/4
. However, this is a bit trickier to fix because most IO devices have encapsulated the io_lib
functionality in a way that, if we add an extra argument to get_until/4
, we would need to add the same argument to other functions in the io_lib
module (which may or may not be a good idea).
Another possible solution is to pass an empty list or an empty binary to io_lib:get_until/4
and convert it to eof
inside the function. This way we keep the original data format. Once again, other functions in io_lib
would have to be changed too.
A third option is to deprecate get_until
in the protocol. get_until
has two issues:
-
The initial state is always assumed to be an empty list
-
The data is always converted to a list, which can be expensive and unnecessary, and it was probably done for backwards compatibility reasons. I would argue that the conversion of list/binary should be made by the MFA given to
get_until
- which can do it as efficiently as possible (or not at all), although this may require the encoding to be given to theget_until
function
Instead, a new get_until
tuple with one extra element could be introduced where:
- The initial state is given as part of the request tuple
- The data is given as is to the MFA, leaving it up to the MFA to convert it to list or binary
- The return type is also always returned as is
If we pass the list/binary as is to the get_until
MFA, then I would be able to solve the eof
issue myself, because I could look at my buffer and see what is the type of data I am working on. I would also allow users of get_until
to be more efficient, especially as binaries become more common place.
If preferred, the new get_until
could have another name, to avoid confusion. In any case, today's get_until
could be implemented on top of the new one, as the new proposal is basically the old get_until
but with less assumptions.
Affected versions
AFAIK, all OTP versions.
Additional context
The problem I am trying to solve is a relatively simple problem. I want to read all data until eof
. Currently, I am working around this issue by doing a io:getopts/1
on the device to see if it is in binary or list mode.