Skip to content

stdlib stubs are unnecessarily strict with file-like objects #4212

Closed
@remram44

Description

@remram44

Problem

Currently the IO situation is less than ideal. Not only are IO[str]/TextIO and IO[bytes]/BinaryIO a bit confusing (interchangeable in most cases), but the use of IO through stdlib is inconsistent and doing things like passing an object with a write() method to json.dump() does not work.

This is because the IO object, while describing the actual objects returned by open() perfectly, is not suitable to represent the "file-like object" interface. This interface is well known, documented prominently in the standard library's documentation (glossary: "file object" and "file-like object") and a testament to duck-typing; however it's not compatible with how typeshed is currently written (for the most part).

Proposal

I propose to introduce Protocols (not abstract classes) to be used for parameters where a "file object" is expected, allowing one to correctly type their file-like objects without having to inherit one of the abstract base classes. Furthermore, I think we should have two protocols representing files that can be read from or written to.

This work can be done incrementally, and I am willing to spend time doing this if there is no veto to this ticket.

Pros

This would allow a file-like object to be passed to json.dump(), zipfile.ZipFile, and others (like it already can to csv.write()).

Using Protocols of this small scale would allow objects that already conform to be used in interfaces expecting file-like object, without having to implement too many methods (or explicitly inherit from the base class, as is required now). This should lower the effort of bringing libraries to the typing world. Using two separate protocols is similar to how most languages do this, off the top of my head:

It is interesting to note that the protocols I describe already exist in typeshed. Not wanting to put IO where the documentation called for file-like object, protocols have already been introduced:

Introducing those protocols would also allow us to remove some of the IO[str]/TextIO complexity: while TextIO and BinaryIO are still needed for the native file objects (they have additional methods compared to IO), the protocols used for function parameters everywhere can be only Read[str] and Read[bytes].

Cons

This is a sizeable change, and people are likely to use both the base class and those protocols for some time. However code using the base class should not break when passed to functions expecting the protocol.

Another caveat is that this might give a false sense of security: libraries in the wild do their own check to determine if an object conform to the interface, and for example pandas will not accept to write on a file object that does implement __iter__. Therefore objects conforming to the protocol might still not be accepted by (IMHO buggy) libraries, while inheriting the base class would make their objects look more like file objects (maybe too much, since it gives everything the attributes of both a readable and a writable file!).

Draft

Unfortunately this is where the typeshed-bikeshed starts, but this is my proposal:

AnyStr = typing.TypeVar('AnyStr', str, bytes)  # typing.AnyStr

class WriteIO(Protocol[AnyStr]):
    def write(self, s: AnyStr) -> int: ...
    def flush(self) -> None: ...
    def close(self) -> None: ...
    def __enter__(self) -> 'WriteIO[AnyStr]': ...
    def __exit__(self, exc_type: Optional[Type[BaseException]], exc_val: Optional[BaseException],
                 exc_tb: Optional[TracebackType]) -> Optional[bool]: ...

class ReadIO(Protocol[AnyStr]):
    def read(self, size: typing.Optional[int] = None) -> AnyStr: ...
    def close(self) -> None: ...
    def __enter__(self) -> 'WriteIO[AnyStr]': ...
    def __exit__(self, exc_type: Optional[Type[BaseException]], exc_val: Optional[BaseException],
                 exc_tb: Optional[TracebackType]) -> Optional[bool]: ...
    def __iter__(self) -> Iterator[AnyStr]: ...

Additional protocols can be added to provide seek()/tell() (similar to Rust's io::Seek trait)

Metadata

Metadata

Assignees

No one assigned

    Labels

    topic: ioI/O related issues

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions