Description
Problem
Currently the IO situation is less than ideal. Not only are IO[str]
/TextIO
and IO[bytes]
/BinaryIO
a bit confusing (interchangeable in most cases), but the use of IO
through stdlib is inconsistent and doing things like passing an object with a write()
method to json.dump()
does not work.
This is because the IO
object, while describing the actual objects returned by open()
perfectly, is not suitable to represent the "file-like object" interface. This interface is well known, documented prominently in the standard library's documentation (glossary: "file object" and "file-like object") and a testament to duck-typing; however it's not compatible with how typeshed is currently written (for the most part).
Proposal
I propose to introduce Protocol
s (not abstract classes) to be used for parameters where a "file object" is expected, allowing one to correctly type their file-like objects without having to inherit one of the abstract base classes. Furthermore, I think we should have two protocols representing files that can be read from or written to.
This work can be done incrementally, and I am willing to spend time doing this if there is no veto to this ticket.
Pros
This would allow a file-like object to be passed to json.dump()
, zipfile.ZipFile
, and others (like it already can to csv.write()
).
Using Protocol
s of this small scale would allow objects that already conform to be used in interfaces expecting file-like object, without having to implement too many methods (or explicitly inherit from the base class, as is required now). This should lower the effort of bringing libraries to the typing world. Using two separate protocols is similar to how most languages do this, off the top of my head:
- Rust (
io::Read
andio::Write
traits). - Java (
InputStream
withread()
,OutputStream
withwrite()
andflush()
) - C++ (
istream
,ostream
)
It is interesting to note that the protocols I describe already exist in typeshed
. Not wanting to put IO
where the documentation called for file-like object
, protocols have already been introduced:
- for
shutil.copyfileobj()
:_Reader
and_Writer
protocols - for
csv.writer()
:_csv._Writer
protocol
Introducing those protocols would also allow us to remove some of the IO[str]
/TextIO
complexity: while TextIO
and BinaryIO
are still needed for the native file objects (they have additional methods compared to IO
), the protocols used for function parameters everywhere can be only Read[str]
and Read[bytes]
.
Cons
This is a sizeable change, and people are likely to use both the base class and those protocols for some time. However code using the base class should not break when passed to functions expecting the protocol.
Another caveat is that this might give a false sense of security: libraries in the wild do their own check to determine if an object conform to the interface, and for example pandas
will not accept to write on a file object that does implement __iter__
. Therefore objects conforming to the protocol might still not be accepted by (IMHO buggy) libraries, while inheriting the base class would make their objects look more like file objects (maybe too much, since it gives everything the attributes of both a readable and a writable file!).
Draft
Unfortunately this is where the typeshed-bikeshed starts, but this is my proposal:
AnyStr = typing.TypeVar('AnyStr', str, bytes) # typing.AnyStr
class WriteIO(Protocol[AnyStr]):
def write(self, s: AnyStr) -> int: ...
def flush(self) -> None: ...
def close(self) -> None: ...
def __enter__(self) -> 'WriteIO[AnyStr]': ...
def __exit__(self, exc_type: Optional[Type[BaseException]], exc_val: Optional[BaseException],
exc_tb: Optional[TracebackType]) -> Optional[bool]: ...
class ReadIO(Protocol[AnyStr]):
def read(self, size: typing.Optional[int] = None) -> AnyStr: ...
def close(self) -> None: ...
def __enter__(self) -> 'WriteIO[AnyStr]': ...
def __exit__(self, exc_type: Optional[Type[BaseException]], exc_val: Optional[BaseException],
exc_tb: Optional[TracebackType]) -> Optional[bool]: ...
def __iter__(self) -> Iterator[AnyStr]: ...
Additional protocols can be added to provide seek()
/tell()
(similar to Rust's io::Seek trait)