-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buffer protocol types #593
Comments
Since the buffer protocol is a standard Python feature and we need to be able to use it in type annotations, it makes sense to me to add # typing.pyi
class Buffer: ... # empty at the Python level
# builtins.pyi
class bytes(Buffer, Sequence[int]): ... We'd have to add (Side note: I think you mean ByteString (https://docs.python.org/3/library/typing.html#typing.ByteString), not BytesType.) |
I like what @JelleZijlstra proposes (obviously the type should be abstract). |
Actually, it's a bit more complicated, since some buffers are writable and others aren't (see the types in python/typeshed#2610). This is controlled by whether the type responds to requests with PyBuf_WRITABLE (https://docs.python.org/3/c-api/buffer.html#c.PyBUF_WRITABLE) set. So here's a revised proposal: # typing.pyi
class ReadableBuffer: ... # abstract, no Python attributes; corresponds to C types that expose buffers without PyBUF_WRITABLE set
class WriteableBuffer(ReadableBuffer): ... # same; corresponds to C types that expose buffers with PyBUF_WRITABLE set
# builtins.pyi
class bytes(ReadableBuffer, Sequence[int]): ...
class bytearray(WritableBuffer, Sequence[int]): ... There are a number of other flags controlling format, dimensions, etc., but I'm not sure those could be easily expressed in the type system. Perhaps we could implement format flag by making Buffer generic over a typevar that is restricted to certain types, but Python types don't map cleanly to C types, so that doesn't seem like it would work well. |
python/typeshed#2895 is one example where this could be useful. |
|
Oh sorry, this is a wrong issue, disregard my last comment. |
Is this something that could be considered? What steps are necessary to continue? |
@srittau If it is not too hard maybe you can directly make a PoC PR to typeshed, so that we can discuss the details (IIUC you want this to be a stub-only feature). cc @gvanrossum |
I believe there is an open python issue about this: https://bugs.python.org/issue27501 |
Until a proper type for the buffer protocol is available, would it make sense to at least partially fix this (in places like
It seems like that would cover the vast majority of use cases. An example of something missing from that type definition that works, at least, for Also - |
Since typing doesn't yet have a way to express buffer protocol objects (python/typing#593), various interfaces have ended up with a mish-mash of options: some list just bytes (or just bytearray, when writable), some include mmap, some include memoryview, I think none of them include array.array even though it's explicitly mentioned as bytes-like, etc. I ran into problems because RawIOBase.readinto didn't allow for memoryview. To allow for some uniformity until the fundamental issue is resolved, I've introduced _typeshed.ReadableBuffer and _typeshed.WriteableBuffer, and applied them in stdlib/3/io.pyi as an example. If these get rolled out in more places, it will mean that we have only one place where they have to get tweaked in future, or swapped out for a public protocol. This unfortunately does have the potential to break code that inherits from RawIOBase/BufferedIOBase and overrides these methods, because the base method is now more general and so the override now needs to accept these types as well (which is why I've also updated gzip and lzma). However, it should be a reasonably easy fix, and will make the downstream annotations more correct. I'm not 100% happy with the names: bytes-like is slightly stricter than just buffer protocol (it must be able to export a C-contiguous buffer), but in practice I'd be surprised if there are types for which there is a difference at static analysis time (e.g. not every memoryview instance is bytes-like, but that's a property of instances, not types).
Since typing doesn't yet have a way to express buffer protocol objects (python/typing#593), various interfaces have ended up with a mish-mash of options: some list just bytes (or just bytearray, when writable), some include mmap, some include memoryview, I think none of them include array.array even though it's explicitly mentioned as bytes-like, etc. I ran into problems because RawIOBase.readinto didn't allow for memoryview. To allow for some uniformity until the fundamental issue is resolved, I've introduced _typeshed.ReadableBuffer and _typeshed.WriteableBuffer, and applied them in stdlib/3/io.pyi as an example. If these get rolled out in more places, it will mean that we have only one place where they have to get tweaked in future, or swapped out for a public protocol. This unfortunately does have the potential to break code that inherits from RawIOBase/BufferedIOBase and overrides these methods, because the base method is now more general and so the override now needs to accept these types as well (which is why I've also updated gzip and lzma). However, it should be a reasonably easy fix, and will make the downstream annotations more correct.
Since typing doesn't yet have a way to express buffer protocol objects (python/typing#593), various interfaces have ended up with a mish-mash of options: some list just bytes (or just bytearray, when writable), some include mmap, some include memoryview, I think none of them include array.array even though it's explicitly mentioned as bytes-like, etc. I ran into problems because RawIOBase.readinto didn't allow for memoryview. To allow for some uniformity until the fundamental issue is resolved, I've introduced _typeshed.ReadableBuffer and _typeshed.WriteableBuffer, and applied them in stdlib/3/io.pyi as an example. If these get rolled out in more places, it will mean that we have only one place where they have to get tweaked in future, or swapped out for a public protocol. This unfortunately does have the potential to break code that inherits from RawIOBase/BufferedIOBase and overrides these methods, because the base method is now more general and so the override now needs to accept these types as well (which is why I've also updated gzip and lzma). However, it should be a reasonably easy fix, and will make the downstream annotations more correct.
As mentioned in #997 it would also be useful to be able to specify length for any buffer types, in particular where a fixed length string is expected. |
Bump :) @ilevkivskyi @srittau What's the process to getting this accepted? Does this require a new PEP? I'd be open to working on this, but I'm not sure where to start. |
A related question is how this would be handled in Python given the move to builtins for type hints (like with PEP 585) |
By the way, I just noticed that @JelleZijlstra's suggestion has been implemented: I suppose that means those should be moved here in order to consider this issue resolved? |
I think it doesn't necessarily require a PEP: we could just add the types to typing.pyi and typing_extensions.pyi (as I suggested in #593 (comment) a long time ago). The process could be similar to what we just did with reveal_type(): a typing-sig discussion, followed by direct implementation in CPython. |
This doesn't seem quite right either as |
I am preparing a PEP to support checking the buffer protocol not only in the type system, but also at runtime. A first draft is at https://github.com/JelleZijlstra/peps/blob/bufferpep/pep-9999.rst. Any early feedback is welcome. |
@JelleZijlstra LGTM so far, although it's unfortunate that it's difficult to distinguish between readable and read/writable buffers, but it makes sense. |
Readonly is only one of a number of attributes that are important for determining whether a buffer can be used. Some libraries can only deal with contiguous buffers, or native endianness, or aligned data. It looks to me like the PEP does the right thing here - best to support either all attributes or none, but not make readonly more important than other attributes. |
This is now PEP 688: https://peps.python.org/pep-0688/. |
Fixed by PEP-688. |
We had several typeshed issues and pull requests lately that try to work around the fact that there is no way to express that a method receives any object following the buffer protocol. The typing documentation mentions
BytesType
ByteString
, which is an alias forUnion[bytes, memoryview, bytearray]
(and thatbytes
can be used as an alias in argument types), but this is missing other types such asarray.array
or user-defined objects. As this is a C API protocol, just defining such a protocol in typeshed in not possible.The text was updated successfully, but these errors were encountered: