Skip to content

IO devices do not convert to binaries on get_until with eof #4992

Closed
@josevalim

Description

Describe the bug

The Erlang specification says that, if a get_until MFA returns a list, the list must be converted to a binary if the IO device is in binary mode. However, this does not happen if get_until is read until EOF.

To Reproduce

Compile this module:

-module(demo).
-export([until_eof/2, start/0]).

until_eof([],eof) -> {done,eof,[]};
until_eof(ThisFar,eof) -> {done,ThisFar,[]};
until_eof(ThisFar,CharList) -> {more,ThisFar++CharList}.

start() ->
    IoServer = group_leader(),
    io:setopts(IoServer, [{binary,true}]),

    IoServer ! {io_request,
                self(),
                IoServer,
                {get_until, unicode, '', ?MODULE, until_eof, []}},
    receive
        {io_reply, IoServer, Data} ->
      erlang:display(Data),
      erlang:halt(0)
    end.

And then:

$ erl -noshell -s demo
dsadsadsa
dsadsadsa
"dsadsadsa\ndsadsadsa\n"

Notice the result is not a binary. However, if you change the code to return before eof (i.e. on the first newline), then a binary is returned. The reason why this happens is because io_lib:get_until/4, which is used by all (?) devices, converts the result to binaries only if the data is a binary, but in this case, the data is the atom eof:

https://github.com/erlang/otp/blob/master/lib/stdlib/src/io_lib.erl#L966

Expected behavior

The expected behaviour is for get_until to return a binary if the device is set to binary and I read until eof.

One possible solution would be to pass the format=list | binary as an argument to io_lib:get_until/4. However, this is a bit trickier to fix because most IO devices have encapsulated the io_lib functionality in a way that, if we add an extra argument to get_until/4, we would need to add the same argument to other functions in the io_lib module (which may or may not be a good idea).

Another possible solution is to pass an empty list or an empty binary to io_lib:get_until/4 and convert it to eof inside the function. This way we keep the original data format. Once again, other functions in io_lib would have to be changed too.

A third option is to deprecate get_until in the protocol. get_until has two issues:

  1. The initial state is always assumed to be an empty list

  2. The data is always converted to a list, which can be expensive and unnecessary, and it was probably done for backwards compatibility reasons. I would argue that the conversion of list/binary should be made by the MFA given to get_until - which can do it as efficiently as possible (or not at all), although this may require the encoding to be given to the get_until function

Instead, a new get_until tuple with one extra element could be introduced where:

  1. The initial state is given as part of the request tuple
  2. The data is given as is to the MFA, leaving it up to the MFA to convert it to list or binary
  3. The return type is also always returned as is

If we pass the list/binary as is to the get_until MFA, then I would be able to solve the eof issue myself, because I could look at my buffer and see what is the type of data I am working on. I would also allow users of get_until to be more efficient, especially as binaries become more common place.

If preferred, the new get_until could have another name, to avoid confusion. In any case, today's get_until could be implemented on top of the new one, as the new proposal is basically the old get_until but with less assumptions.

Affected versions

AFAIK, all OTP versions.

Additional context

The problem I am trying to solve is a relatively simple problem. I want to read all data until eof. Currently, I am working around this issue by doing a io:getopts/1 on the device to see if it is in binary or list mode.

Metadata

Assignees

Labels

bugIssue is reported as a bugteam:VMAssigned to OTP team VM

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions