Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collections.abc.ByteString is not equivalent to typing.ByteString #102092

Closed
hauntsaninja opened this issue Feb 20, 2023 · 7 comments
Closed

collections.abc.ByteString is not equivalent to typing.ByteString #102092

hauntsaninja opened this issue Feb 20, 2023 · 7 comments
Labels
docs Documentation in the Doc dir topic-typing type-bug An unexpected behavior, bug, or error

Comments

@hauntsaninja
Copy link
Contributor

hauntsaninja commented Feb 20, 2023

There are two related, but different strands of conversation here:

However, there is an additional issue! collections.abc.ByteString is not an accurate replacement for typing.ByteString! collections.abc.ByteString only registers bytes and bytearray, whereas typing.ByteString is documented as also representing a memoryview. This is an issue regardless of the above two related conversations.

For this issue we could:

  • Move the documentation of typing.ByteString out from under the "Corresponding to collections in collections.abc" section to the "Other concrete types" section
  • Remove all mention of collections.abc.ByteString from typing.rst
  • Keep the mention of typing.ByteString as being deprecated, but change the reason. ByteString is not a generic type and collections.abc.ByteString is not a semantic replacement for it as above, so the current reason is wrong on two counts
  • Define typing.ByteString in typeshed as a simple Union, and not the Sequence[int] thing it is right now.
  • Define typing.ByteString as a Union in typing.py instead of the generic alias it is now(?)

Here's how this relates to the other two strands of conversation:

  • We should also remove the note about the bytes shorthand from the docs
  • We should deprecate both typing.ByteString and collections.abc.ByteString. I'm fine with going slow on the removal of typing.ByteString, since once it's a union in typeshed it's not causing much harm. But ideally we can remove collections.abc.ByteString in 3.14, since its isinstance behaviour is not what you want.

cc @JelleZijlstra

@hauntsaninja hauntsaninja added type-bug An unexpected behavior, bug, or error docs Documentation in the Doc dir topic-typing labels Feb 20, 2023
hauntsaninja added a commit to hauntsaninja/flake8-pyi that referenced this issue Feb 20, 2023
It is highly confusing and its isinstance / issubclass behaviour does
not match the documentation of typing.ByteString

See python/cpython#102092
hauntsaninja added a commit to hauntsaninja/flake8-pyi that referenced this issue Feb 20, 2023
It is highly confusing and its isinstance / issubclass behaviour does
not match the documentation of typing.ByteString

See python/cpython#102092
hauntsaninja added a commit to hauntsaninja/typeshed that referenced this issue Feb 20, 2023
See python/cpython#102092

This makes the behaviour of typing.ByteString better match its
documentation.

I also remove collections.abc.ByteString. There is no legitimate use
case for it. Let me know if you think there's a case where it would help
users to have this around.
@JelleZijlstra
Copy link
Member

Thanks for the research here! Before we make changes in CPython it might make sense to wait for PEP 688 to be accepted (hopefully soon?), since that will affect what we should point users to.

I'm not sure I see the issue, though. At runtime typing.ByteString really is an alias for collections.abc.ByteString: ByteString = _alias(collections.abc.ByteString, 0) # Not generic.

And memoryview is not a subclass of either type at runtime:

In [51]: issubclass(memoryview, collections.abc.ByteString)
Out[51]: False

In [52]: issubclass(memoryview, typing.ByteString)
Out[52]: False

So while I am in favor of deprecating both ByteString objects, I don't see why we need to decouple them.

@hauntsaninja
Copy link
Contributor Author

hauntsaninja commented Feb 20, 2023

... I don't see why we need to decouple them.

I was hoping we could get away with treating typing.ByteString as bytes | bytearray | memoryview. This seemed like too breaking of a change for a collections.abc.ByteString, since the whole point of ABCs is to make your own subtypes of them and there are indications that collections.abc.ByteString was genuinely meant to be like bytes | bytearray.

I realise I was also kind of hoping we could just... pretend that collections.abc.ByteString doesn't exist, since I'd like to avoid users thinking there's an interface there.

Anyway, the claim is that typing.ByteString has clearly defined, reasonably well documented semantics here:

This type represents the types bytes, bytearray, and memoryview of byte sequences.

And while PEP 688 provides a path for something with better semantics (since users of typing.ByteString would probably want all buffers), maybe the existing semantics are still useful, and we can "fix" this outright bug in the implementation of typing.ByteString.


So possible "fixes" are:

  1. Make typing.ByteString equivalent to bytes | bytearray | memoryview. As I type this, I'm realising this has the downside it would break any user who was subclassing typing.ByteString

(python/typeshed#9783 doesn't surface issues and I couldn't find any typing.ByteString subclasses on grep.app, and I found a collections.abc.ByteString one)

  1. Register memoryview with collections.abc.ByteString. While desirable for use via typing.ByteString, there are indications that collections.abc.ByteString was genuinely meant to be just bytes | bytearray and e.g. have startswith and things like that. In which case this could break user expectations. E.g., the docstring says:

This unifies bytes and bytearray. XXX Should add all their methods.

edit: it is also tested that memoryview is not a collections.abc.ByteString

  1. Don't "fix", and just aggressively pursue the deprecations.

@hauntsaninja
Copy link
Contributor Author

hauntsaninja commented Feb 20, 2023

I guess 2 might be more viable than I initially gave it credit for. This table is pretty clear that the interface for collections.abc.ByteString is __getitem__ + __len__. More authoritative than the "XXX" docstring, at least :-)

We can still go with option 3 and just deprecate instead of touching these, if we feel it's too messy.

@JelleZijlstra
Copy link
Member

So your claim is essentially that there is a discrepancy between the implementation and the documentation for typing.ByteString, and we should fix that discrepancy by adjusting the implementation. I'm not sure why you don't want to change the documentation instead, though. memoryview is not a ByteString at runtime or according to typeshed, so it seems less disruptive to adjust the documentation.

My preference is to deprecate both variants of ByteString. As I see it, there are three possible use cases for ByteString, and it doesn't work well for any of them:

  • A shorthand for "any buffer type". ByteString doesn't actually mean this; PEP 688 will provide a solid solution for this use case.
  • A shorthand for "bytes | bytearray", or possibly "bytes | bytearray | memoryview" depending on which part of the docs we believe. Users who want this should just write "bytes | bytearray": it's barely any longer and does not suffer from the ambiguity of ByteString.
  • An ABC representing the common interface of bytes and bytearray, as suggested by the "XXX" comment in the docstring. But since collections.abc.ByteString doesn't define any of this interface, it's not actually useful for this purpose. And as @rhettinger explained when I tried to add the shared bytes/bytearray methods, backward compatibility means we can essentially never add methods to an ABC.

ByteString is purely a source of confusion and I don't think it is useful. Let's deprecate it and remove it in 3.14.

@hauntsaninja
Copy link
Contributor Author

Okay, great. #102096 adds deprecation warnings to collections.abc.ByteString

@srittau
Copy link
Contributor

srittau commented Feb 21, 2023

+1 on the deprecation and eventual removal. Since the situation is so muddled, and ByteString seems mostly unused, it's better to start from the beginning.

hauntsaninja added a commit to hauntsaninja/cpython that referenced this issue May 8, 2023
The bytes shorthand was removed in PEP 688:
https://peps.python.org/pep-0688/#no-special-meaning-for-bytes

I also remove the reference to `collections.abc.ByteString`, since that
object is deprecated (python#91896) and has different semantics (python#102092)

I think it would be good to backport this since the changes in the type
system are immediate and apply to all versions of Python.
AlexWaygood pushed a commit that referenced this issue May 8, 2023
The bytes shorthand was removed in PEP 688:
https://peps.python.org/pep-0688/#no-special-meaning-for-bytes

I also remove the reference to `collections.abc.ByteString`, since that
object is deprecated (#91896) and has different semantics (#102092)
AlexWaygood pushed a commit to AlexWaygood/cpython that referenced this issue May 8, 2023
The bytes shorthand was removed in PEP 688:
https://peps.python.org/pep-0688/#no-special-meaning-for-bytes

I also remove the reference to `collections.abc.ByteString`, since that
object is deprecated (python#91896) and has different semantics (python#102092)
AlexWaygood added a commit that referenced this issue May 8, 2023
gh-102500: Remove mention of bytes shorthand (#104281)

The bytes shorthand was removed in PEP 688:
https://peps.python.org/pep-0688/#no-special-meaning-for-bytes

The reference to collections.abc.ByteString is also removed, since that object is deprecated (#91896) and has different semantics (#102092)

Although PEP 688 is new in Python 3.12, type checkers are expected to implement the new semantics for type annotations even if users are using an older version of Python, so this docs PR is backported to Python 3.11.

Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
jbower-fb pushed a commit to jbower-fb/cpython-jbowerfb that referenced this issue May 8, 2023
The bytes shorthand was removed in PEP 688:
https://peps.python.org/pep-0688/#no-special-meaning-for-bytes

I also remove the reference to `collections.abc.ByteString`, since that
object is deprecated (python#91896) and has different semantics (python#102092)
hauntsaninja added a commit to python/typeshed that referenced this issue May 28, 2023
See python/cpython#102092

This makes the behaviour of typing.ByteString better match its
documentation.
@AlexWaygood
Copy link
Member

I think we're done here -- the docs for both collections.abc.ByteString and typing.ByteString have pretty clear "don't use this!" notices next to them now, and there's no claim of direct equivalency anymore.

Feel free to reopen if there's something more to be done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir topic-typing type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants