Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make regex methods accept all buffer types as inputs #7158

Closed

Conversation

itaisteinherz
Copy link
Contributor

Fixes #1467
Refs #3014

~/Development/PoCs/typing
❯ cat scratch.py
import mmap
import re

with open("scratch.py", 'r+') as f:
    with mmap.mmap(f.fileno(), 0) as data:
        match = re.compile(b"").match(data)

~/Development/PoCs/typing
❯ mypy scratch.py
scratch.py:6: error: Argument 1 to "match" of "Pattern" has incompatible type "mmap"; expected "bytes"
Found 1 error in 1 file (checked 1 source file)

~/Development/PoCs/typing
❯ mypy --custom-typeshed-dir ../../OSS/typeshed scratch.py
Success: no issues found in 1 source file

stdlib/mmap.pyi Outdated
@@ -27,7 +27,7 @@ if sys.platform != "win32":

PAGESIZE: int

class mmap(AbstractContextManager[mmap], Iterable[int], Sized):
class mmap(AbstractContextManager[mmap], bytearray, Iterable[int], Sized):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, why does bytearray have to be in this position in the class signature? I initially wanted to place it after Iterable[int] and before Sized, but received this error:

~/Development/PoCs/typing
❯ mypy --custom-typeshed-dir ../../OSS/typeshed scratch.py
scratch.py:5: error: "mmap" has no attribute "__enter__"; maybe "__iter__"?
scratch.py:5: error: "mmap" has no attribute "__exit__"
Found 2 errors in 1 file (checked 1 source file)

@github-actions

This comment has been minimized.

@itaisteinherz itaisteinherz changed the title Make mmap.mmap extend bytearray WIP: Make mmap.mmap extend bytearray Feb 7, 2022
@itaisteinherz itaisteinherz changed the title WIP: Make mmap.mmap extend bytearray Make mmap.mmap extend bytearray Feb 7, 2022
@itaisteinherz itaisteinherz marked this pull request as draft February 7, 2022 21:42
@srittau
Copy link
Collaborator

srittau commented Feb 8, 2022

In the past we've had problems whenever we tried to "fudge" types and made them differ from runtime types. I think a better solution is python/typing#593.

@JelleZijlstra
Copy link
Member

I see that I approved the previous instance of this, but I no longer think it's a good idea to lie in the stubs.

The proper solution is python/typing#593. Until that is implemented, we can just use the aliases from https://github.com/python/typeshed/blob/master/stdlib/_typeshed/__init__.pyi#L185.

@itaisteinherz
Copy link
Contributor Author

itaisteinherz commented Feb 8, 2022

@srittau @JelleZijlstra I'll update this PR to use the currently-implemented aliases then.

Regarding python/typing#593 (and python/typing#593 (comment)), since I've never participated in any Python maling list discussion nor worked on CPython, it'll probably take me a while until I'll be comfortable with leading the implementation of the proper solution 😬 (Any tips would be much appreciated :))

stdlib/mmap.pyi Outdated
@@ -27,7 +27,7 @@ if sys.platform != "win32":

PAGESIZE: int

class mmap(AbstractContextManager[mmap], Iterable[int], Sized):
class mmap(AbstractContextManager[mmap], ReadableBuffer, Iterable[int], Sized):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thought: how does this even work? ReadableBuffer is a Union which includes mmap.mmap, which leads me to the conclusion that this should result in some sort of infinite type loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: this does lead to an error :) Will probably split up WriteableBuffer to fix this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add it to the union, not use the union as a base class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? mmap.mmap is already part of WriteableBuffer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. Then we shouldn't change it at all.

The original problem was that mmap wasn't accepted in re.match. If re.match accepts all buffers, then we should change its signature, not anything about mmap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Will look into re.match. Note that having mmap.mmap extend ReadOnlyBuffer does make some sense to me, though you're probably right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon further research, it seems to me like a way to solve this would be to have mmap.mmap extend ByteString, and change the definition of AnyStr to:

AnyStr = TypeVar("AnyStr", str, ByteString)

However, that would require a change in CPython. I could also create an AnyStr alternative, specific to Pattern, with this definition. One last option would be to have mmap.mmap extend bytes (aka ReadOnlyBuffer). What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not change AnyStr. That would require CPython runtime changes. We could use a separate typevar specific to regexes, but I'd have to look more into it to figure out whether that's feasible.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the definition of AnyStr is not an option.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@itaisteinherz
Copy link
Contributor Author

itaisteinherz commented Feb 14, 2022

Nice, the last run found a mypy crash. If you're so inclined, feel free to report that on the mypy issue tracker.

Cool, I didn't even notice 🧐 Will take a look. Also, defining StrOrBuffer as TypeVar("StrOrBuffer", str, ReadableBuffer) didn't work for some reason. I had to write it as TypeVar("StrOrBuffer", str, ReadOnlyBuffer, WriteableBuffer).

@itaisteinherz itaisteinherz changed the title Make mmap.mmap extend bytearray Make regex methods accept all buffer types as inputs Feb 14, 2022
@github-actions

This comment has been minimized.

@itaisteinherz itaisteinherz marked this pull request as ready for review February 14, 2022 22:30
@github-actions

This comment has been minimized.

@itaisteinherz
Copy link
Contributor Author

Note that the PR does not fix the original issue at the moment, as re.compile(b"") returns Pattern[bytes], which means that running .match(data) will not work. That would probably be fixed if I were able to do:

StrOrBuffer = TypeVar("StrOrBuffer", str, ReadableBuffer)

However that didn't work for me earlier, so I'll look into it.

@itaisteinherz itaisteinherz marked this pull request as draft February 14, 2022 22:36
Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't exactly describe re's real behavior in a few cases. I think Pattern and Match should continue to be generic over AnyStr, but additional overloads can be added so that e.g. Pattern[bytes].search() accepts other ReadableBuffers.

@overload
def group(self, __group: str | int) -> AnyStr | Any: ...
def group(self, __group: str | int) -> StrOrBuffer | Any: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't right: if you match on a memoryview, you get bytes objects from .group(), not memoryviews.

stdlib/re.pyi Outdated
@@ -52,66 +53,90 @@ if sys.version_info < (3, 7):
_pattern_type: type

@overload
def compile(pattern: AnyStr, flags: _FlagsType = ...) -> Pattern[AnyStr]: ...
def compile(pattern: StrOrBuffer, flags: _FlagsType = ...) -> Pattern[StrOrBuffer]: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re.compile actually works only on str or bytes

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

Diff from mypy_primer, showing the effect of this PR on open source code:

black (https://github.com/psf/black)
+ src/blib2to3/pgen2/literals.py:25:15: error: "Match" expects 2 type arguments, but 1 given

urllib3 (https://github.com/urllib3/urllib3)
+ src/urllib3/util/ssl_match_hostname.py:23: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ src/urllib3/util/url.py:242: error: Returning Any from function declared to return "str"  [no-any-return]

websockets (https://github.com/aaugustin/websockets)
+ src/websockets/version.py:74: error: Returning Any from function declared to return "str"

pytest (https://github.com/pytest-dev/pytest)
+ src/_pytest/junitxml.py:51: error: "Match" expects 2 type arguments, but 1 given  [type-arg]

rich (https://github.com/willmcgugan/rich)
+ rich/_emoji_replace.py:7: error: "Match" expects 2 type arguments, but 1 given
+ rich/_emoji_replace.py:23: error: "Match" expects 2 type arguments, but 1 given
+ rich/markup.py:43: error: "Match" expects 2 type arguments, but 1 given
+ rich/markup.py:60: error: "Match" expects 2 type arguments, but 1 given

zulip (https://github.com/zulip/zulip)
+ tools/lib/capitalization.py:213: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ zerver/lib/markdown/help_settings_links.py:149: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ zerver/lib/markdown/help_relative_links.py:115: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ zerver/lib/url_preview/preview.py:46: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ zerver/lib/mention.py:137: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ zerver/openapi/markdown_extension.py:476: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ zerver/openapi/markdown_extension.py:489: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ zerver/lib/markdown/help_emoticon_translations_table.py:61: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ zerver/lib/markdown/__init__.py:606: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ zerver/lib/markdown/__init__.py:1375: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ zerver/lib/markdown/__init__.py:1486: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ zerver/lib/markdown/__init__.py:1498: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ zerver/lib/markdown/__init__.py:1509: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ zerver/lib/markdown/__init__.py:1535: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ zerver/lib/markdown/__init__.py:1624: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ zerver/lib/markdown/__init__.py:1826: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ zerver/lib/markdown/__init__.py:1846: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ zerver/lib/markdown/__init__.py:1902: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ zerver/lib/markdown/__init__.py:1942: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ zerver/lib/markdown/__init__.py:1973: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ zerver/lib/markdown/__init__.py:2082: error: "Match" expects 2 type arguments, but 1 given  [type-arg]

flake8 (https://github.com/pycqa/flake8)
+ src/flake8/violation.py:19: error: "Match" expects 2 type arguments, but 1 given

vision (https://github.com/pytorch/vision)
+ torchvision/prototype/datasets/_builtin/imagenet.py:95: error: "Match" expects 2 type arguments, but 1 given  [type-arg]

sphinx (https://github.com/sphinx-doc/sphinx)
+ sphinx/util/matching.py: note: In function "compile_matchers":
+ sphinx/util/matching.py:63:76: error: "Match" expects 2 type arguments, but 1 given
+ sphinx/util/matching.py: note: In function "patmatch":
+ sphinx/util/matching.py:92:47: error: "Match" expects 2 type arguments, but 1 given

werkzeug (https://github.com/pallets/werkzeug)
+ src/werkzeug/debug/__init__.py:90: error: Returning Any from function declared to return "Union[str, bytes, None]"  [no-any-return]
+ src/werkzeug/routing.py:2004: error: "Match" expects 2 type arguments, but 1 given  [type-arg]

cwltool (https://github.com/common-workflow-language/cwltool)
+ cwltool/job.py: note: In function "neverquote":
+ cwltool/job.py:110:72: error: "Match" expects 2 type arguments, but 1 given  [type-arg]

aiohttp (https://github.com/aio-libs/aiohttp)
+ aiohttp/helpers.py:366: error: Returning Any from function declared to return "str"  [no-any-return]

edgedb (https://github.com/edgedb/edgedb)
+ edb/edgeql/codegen.py:52:26: error: "Match" expects 2 type arguments, but 1 given

attrs (https://github.com/python-attrs/attrs)
+ src/attr/validators.pyi:60: error: "Match" expects 2 type arguments, but 1 given

tornado (https://github.com/tornadoweb/tornado)
+ tornado/util.py:199: error: "Match" expects 2 type arguments, but 1 given

mypy (https://github.com/python/mypy)
+ mypy/checkstrformat.py:44: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ mypy/checkstrformat.py:117: error: "Match" expects 2 type arguments, but 1 given  [type-arg]

core (https://github.com/home-assistant/core)
+ homeassistant/components/acer_projector/switch.py:115: error: Returning Any from function declared to return "str"  [no-any-return]

steam.py (https://github.com/Gobot1234/steam.py)
+ steam/utils.py:368: error: "Match" expects 2 type arguments, but 1 given  [type-arg]

porcupine (https://github.com/Akuli/porcupine)
+ porcupine/textutils.py:553: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ porcupine/plugins/aboutdialog.py:18: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ porcupine/plugins/run/no_terminal.py:231: error: "Match" expects 2 type arguments, but 1 given  [type-arg]

pylint (https://github.com/pycqa/pylint)
+ pylint/checkers/utils.py:697: error: "Match" expects 2 type arguments, but 1 given  [type-arg]

psycopg (https://github.com/psycopg/psycopg)
+ psycopg/psycopg/copy.py:640: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ psycopg/psycopg/copy.py:648: error: "Match" expects 2 type arguments, but 1 given  [type-arg]
+ psycopg/psycopg/_queries.py:217: error: "Match" expects 2 type arguments, but 1 given  [type-arg]

pydantic (https://github.com/samuelcolvin/pydantic)
+ pydantic/color.py:259: error: Unused "type: ignore" comment

dragonchain (https://github.com/dragonchain/dragonchain)
- dragonchain/broadcast_processor/broadcast_functions.py:95:32: error: Item "None" of "Optional[Match[str]]" has no attribute "group"
+ dragonchain/broadcast_processor/broadcast_functions.py:95:32: error: Item "None" of "Optional[Match[Any, Any]]" has no attribute "group"
- dragonchain/lib/authorization.py:315:19: error: Item "None" of "Optional[Match[str]]" has no attribute "group"
+ dragonchain/lib/authorization.py:315:19: error: Item "None" of "Optional[Match[Any, Any]]" has no attribute "group"
- dragonchain/lib/authorization.py:317:25: error: Item "None" of "Optional[Match[str]]" has no attribute "group"
+ dragonchain/lib/authorization.py:317:25: error: Item "None" of "Optional[Match[Any, Any]]" has no attribute "group"
- dragonchain/lib/authorization.py:340:31: error: Item "None" of "Optional[Match[str]]" has no attribute "group"
+ dragonchain/lib/authorization.py:340:31: error: Item "None" of "Optional[Match[Any, Any]]" has no attribute "group"

git-revise (https://github.com/mystor/git-revise)
+ gitrevise/odb.py:113: error: Returning Any from function declared to return "bytes"
+ gitrevise/odb.py:120: error: Returning Any from function declared to return "bytes"
+ gitrevise/odb.py:127: error: Returning Any from function declared to return "bytes"
+ gitrevise/odb.py:134: error: Returning Any from function declared to return "bytes"
+ gitrevise/odb.py:141: error: Returning Any from function declared to return "bytes"

@itaisteinherz
Copy link
Contributor Author

I noticed that the type of Match.string depends on the type of the string provided to Pattern.match. It seems to me like a neat way to implement this would be to add an additional type variable to Match, however that would be a breaking change (I think).

@itaisteinherz
Copy link
Contributor Author

Had it been possible to provide a default value for MatchString, that would have been a non-breaking change. However, it seems like the discussion on defaults for Generic was never resolved.

@JelleZijlstra
Copy link
Member

I implemented an alternative in #7679. Thanks for your work here! You may be interested in the upcoming PEP 688.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mypy does not accept mmap object in re.match
3 participants