-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable checking against mypy with minimal set of fixes #1402
Conversation
4 significant checks are disabled: * `strict_optional` - check that an object might be `None` on access * `warn_no_return` - requiring a `return` * `assignment` - any assignment of a variable of a wrong type * `var-annotated` - a type annotation is missing
I'm sorry, this is is still way to much rewriting of code. I will not under any circumstances change the way I write code in a dynamically typed language to suite a type checker. The checker should only be verifying that existing annotations are correct. The only edits which are acceptable are edits to the annotations. If you can't make that work, then mypy is not the right tool. I see three ways forward:
|
@@ -274,6 +261,27 @@ def get_stash(m): | |||
return util.INLINE_PLACEHOLDER_RE.sub(get_stash, text) | |||
|
|||
|
|||
class LegacyPattern(_BasePattern): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have very serious objections to the changes to this file. We have API changes. Under no circumstances am I interested in making API changes for type checking. If that means it's not possible to move forward with mypy, then we don't move forward with mypy.
To be clear, this specific line of code is not any more or less relevant than any other. However, the changes are rather sweeping and it was not possible to easily highlight and comment on all relevant lines. I'm speaking about most of the changes to this file in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no actual change to the API, all usages can remain the same.
But this is still indeed the worst change here. I spent a lot of time thinking of this but it's the best I came up with. We could also solicit ideas regarding this.
I'll comment further on this soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me explain what the problem here is.
This is the general explanation:
https://stackoverflow.com/questions/6034662/python-method-overriding-does-signature-matter
For the particular situation at hand:
class InlineProcessor(Pattern):
violates the Liskov Substitution Principle.
And let's just quote directly from mypy's output:
Signature of "handleMatch" incompatible with supertype "Pattern" [override]
Superclass:
def handleMatch(self, m: Match[str]) -> Element | str
Subclass:
def handleMatch(self, m: Match[str], data: str) -> tuple[Element | str | None, int | None, int | None]
The particular violation happens regardless of mypy, it's an actual run-time problem if users expect it to work like is widely accepted:
If you write:
if isinstance(foo, Pattern):
foo.handleMatch(single_argument)
it must work, because every instance of Pattern
must have a single-argument handleMatch
method, because it's written so right there!
An instance of InlineProcessor
is an instance of Pattern
, and every instance of it violates this principle.
More specific code example:
>>> import re
>>> from markdown.inlinepatterns import EscapeInlineProcessor, Pattern
>>> foo = Pattern('a')
>>> foo.handleMatch(re.search('a', 'a')) # OK
>>> foo = EscapeInlineProcessor('a')
>>> isinstance(foo, Pattern)
True
>>> foo.handleMatch(re.search('a', 'a'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: EscapeInlineProcessor.handleMatch() missing 1 required positional argument: 'data'
The correct hierarchy would be this:
graph TD;
BasePattern["BasePattern<br>defines helper methods only"]-->Pattern["Pattern<br>abstract def handleMatch(m):"];
BasePattern-->InlineProcessor["InlineProcessor<br>abstract def handleMatch(m, data):"];
Pattern-->SimpleTextPattern["SimpleTextPattern<br>def handleMatch(m):"]
Pattern-->SimpleTagPattern["SimpleTagPattern<br>def handleMatch(m):"]
InlineProcessor-->EscapeInlineProcessor["EscapeInlineProcessor<br>def handleMatch(m, data):"]
InlineProcessor-->AsteriskProcessor["AsteriskProcessor<br>def handleMatch(m, data):"]
But the current hierarchy is:
graph TD;
Pattern["Pattern<br>defines helper methods<br>abstract def handleMatch(m):"];
Pattern-->InlineProcessor["InlineProcessor<br>abstract def handleMatch(m, data):"];
Pattern-->SimpleTextPattern["SimpleTextPattern<br>def handleMatch(m):"]
Pattern-->SimpleTagPattern["SimpleTagPattern<br>def handleMatch(m):"]
InlineProcessor-->EscapeInlineProcessor["EscapeInlineProcessor<br>def handleMatch(m, data):"]
InlineProcessor-->AsteriskProcessor["AsteriskProcessor<br>def handleMatch(m, data):"]
As such, the entire hierarchy of InlineProcessor
is said to have improper overrides.
What my change accomplishes is to construct the correct type hierarchy to mypy and to all typechecking users going forward.
However at run time I am keeping everything just like it was. LegacyPattern
is just a new alias to Pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See a code search of occurrences in the wild: currently every subclass of InlineProcessor
needs to add an ignore comment for this override.
Of course at the moment that's not this repository's fault (as it doesn't have py.typed
) but just because the type stubs are written in a way that directly mirrors it:
https://github.com/python/typeshed/blob/9ff393558a6f4fedc086bc9dc4386dbe529a3331/stubs/Markdown/markdown/inlinepatterns.pyi#L48
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also this issue report (which I found through the above code search because it's directly inspired by this case). Which if accepted would resolve this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An equivalent change to the external stubs would be this:
oprypin/typeshed@43593a4
Maybe this more condensed view is clearer to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just went through and noted (nearly) every objection I have to these changes.
@@ -175,7 +175,7 @@ def test(self, parent: etree.Element, block: str) -> bool: | |||
return block.startswith(' '*self.tab_length) and \ | |||
not self.parser.state.isstate('detabbed') and \ | |||
(parent.tag in self.ITEM_TYPES or | |||
(len(parent) and parent[-1] is not None and | |||
(len(parent) > 0 and parent[-1] is not None and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should be completely unnecessary. The code works just fine and any tool which prevents me from writing code that way is unwanted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code can actually sometimes return an instance of int
rather than bool
. If we have to, I'll follow the usual over-the-top process of proving to you that this is a bug and writing a test.
markdown/core.py
Outdated
@@ -85,7 +85,7 @@ class Markdown: | |||
callable which accepts an [`Element`][xml.etree.ElementTree.Element] and returns a `str`. | |||
""" | |||
|
|||
def __init__(self, **kwargs): | |||
def __init__(self, **kwargs) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This addition adds no value and I will not tolerate being forced to add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding any annotation indicates "please type-check the body of this function". The cool thing about it is that if you add a new function and not write any type annotations on it, mypy will not complain whatsoever. It is a good thing to have this tool. So this un-nuanced take on this is unproductive.
Here I'm adding this -> None
because this function contains some type annotations that are being wasted otherwise. To not have problems with this, I'll have to move the annotations to class level. Let me know if that works better for your preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But when does an __init__
method return anything other than None
? That's my point. Adding this annotation adds no value in that it tells me nothing I didn't already know. I should not be required to add this annotation every time I define and __init__
function.
You seem to object based on the fact that eliminating the annotation causes the tool to not run other checks on the line. That is a problem with the tool. If the tool can't do that, then fix the tool or use a different tool. Don't force me to waste time and effort adding otherwise pointless annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding -> None
annotations to __init__
is useless, I totally agree on that. However you ignore the point that I am actually arguing for. You want the tool to not do anything unless you call for it. As such, not adding any annotations accomplishes that. But when you do want to go ahead and tell the tool "please type-check this function body", this is the designated way to do that. You are not forced, and I demonstrated that with the current state of this pull request.
Separately on your last point: the tool does not require this annotation, you can simply configure it to check all function bodies. If you want it to check only some function bodies, this is how to indicate it.
markdown/core.py
Outdated
html = html.encode(encoding, "xmlcharrefreplace") | ||
sys.stdout.buffer.write(html) | ||
html_bytes = html.encode(encoding, "xmlcharrefreplace") | ||
sys.stdout.buffer.write(html_bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I have stated repeatedly, I embrace the fact that Python is dynamically typed and allows reuse of a variable. Any tool which disallows this is unacceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK let's use an ignore comment instead
Yeah it's totally a limitation of mypy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then lets find a tool which doesn't have this limitation. Quit wasting my time with this nonsense.
pattern += (md,) | ||
pattern = SubstituteTextPattern(*pattern) | ||
for ind, pattern_args in enumerate(patterns): | ||
pattern = SubstituteTextPattern(*pattern_args, md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again the dynamic nature of Python is being taken away here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed a limitation of mypy.
That being said, this change is just a mordernization that could've been done anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear. if the code had been written that way originally when it was submitted in a PR, I would have accepted It. But I would be willing to accept the existing code also (as I did). I don't have an objection to either way, but I do have an objection to a tool forcing one way on me and not allowing the other. That is not a tool I am interested in using.
markdown/postprocessors.py
Outdated
html = self.stash_to_string(self.md.htmlStash.rawHtmlBlocks[i]) | ||
raw: str = self.md.htmlStash.rawHtmlBlocks[i] | ||
html = self.stash_to_string(raw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, so mypy doesn't allow his pattern (funct1(funct2(value))
)? That's ridiculous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not that it doesn't allow nested calls, it's that there is another problem:
Argument 1 to
stash_to_string
ofRawHtmlPostprocessor
has incompatible typestr | Element
; expectedstr
self.md.htmlStash.rawHtmlBlocks[i]
is type-annotated as str | Element
-which is correct because md_in_html
can indeed put an Element
there:
markdown/markdown/extensions/md_in_html.py
Line 166 in a63e6f3
self.cleandoc.append(self.md.htmlStash.store(element)) |
However, stash_to_string
is guarded by a str
type annotation because indeed for an Element
it will just produce garbage such as "<Element 'a' at 0x7f55e83a8ea0>"
, and mypy rightly warns about it.
So once again there is a potential bug that mypy warns us about.
Upon looking into this though, I concluded that maybe there is no actual bug because RawHtmlPostprocessor
consistently only puts strings into htmlStash
.
For now turned it into an ignore comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon looking into this though, I concluded that maybe there is no actual bug because
RawHtmlPostprocessor
consistently only puts strings intohtmlStash
.
Exactly. Again, this is a waste of time and not helpful at all. Unless maybe this is telling us that we need to better annotate the stash to indicate that it only accepts strings. Would doing that resolve the issue? If so, then that would be how to address this, not this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you did not read my comment very attentively because I explicitly say that there is in fact code that puts Elements into the stash.
markdown/treeprocessors.py
Outdated
new_style = isinstance(pattern, inlinepatterns.InlineProcessor) | ||
if isinstance(pattern, inlinepatterns.InlineProcessor): | ||
new_style = True | ||
new_pattern = pattern | ||
else: # pragma: no cover | ||
new_style = False | ||
legacy_pattern = pattern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again this (and all the associated changes in this file that follow) is removing the dynamic nature of python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK let's just ignore-comment everything instead, I don't know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A cool thing to do here would be to refactor this method into 2 methods. Despite some code duplication, it's actually less error-prone going forward, and even provides a clearer deprecation for old-style patterns. (Also only one # pragma: no cover
comment now, rather than per-case ones 😊
This is demonstrated here: oprypin@1c047f0?diff=split
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK let's just ignore-comment everything instead, I don't know
Actually, as per my other comment, not don't just ignore-comment everything.
A cool thing to do here would be to refactor this method into 2 methods.
Perhaps, but that should be addresses separately.
So, it seems that if you can configure mypy to not require any of the code changes included here, then it would be something we can use. However, I will note that simply commenting lines of code may not be the best solution in some cases. For example. there is little difference in |
Yes, it seems that the projects goals and this PR's goals are a bit at odds. It sounds like Python Markdown only cares about checking that types are defined, but type checking itself is not desired. I will add one point though. There seems to be some disdain for explicitly defining a function has no return, but if that is disabled, it seems you have no actual way of checking if a return is required and missed? Maybe I'm wrong about this? |
I'm okay with that. The only thing that should be checked is that existing annotations are correct. Anything that is not annotated should be ignored. Seems like a simple concept. I don't understand why that is so hard... except that perhaps it is because the goal of the type checker is to enforce static typing, which is something I have repeatedly said I am not interested in. |
tab_length: int | ||
ESCAPED_CHARS: list[str] | ||
block_level_elements: list[str] | ||
registeredExtensions: list[Extension] | ||
stripTopLevelTags: bool | ||
references: dict[str, tuple[str, str]] | ||
htmlStash: util.HtmlStash | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are instance attributes, not class attributes. Will you please stop changing the code to fit a tool which I have repeatedly stated does not fit our needs.
I am closing this issue as you are clearly either not understanding my direction or intentionally ignoring it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the correct way to annotate instance attributes according to all tools and documentation
In which way, just purely conceptually, do you think this is supposed to be accomplished? What is "correct" in isolation? The only known way to do this is to check code that uses annotated members. If something doesn't match up then it shows an error, and you need to check if it's the code that is wrong or the annotation that is wrong. |
That's fair, and it is possible I've misinterpreted some interaction. So yeah, I'm good with that. |
Yes, when its annotated that is reasonable. And when it is not annotated, it should be ignored. In other words, suppose we have to functions, one of which is annotated and one which is not. The tool would see a line of code which calls the annotated function, It would see the annotation and confirm that the return value matches up. However, when it sees the unannotated function is would not confirm anything about the return value as there is nothing to confirm. That is what I expect to happen. In the first case, the assumption would then be that perhaps the annotation is wrong. However, we could check for a potential bug in the code and fix that instead if need be. In the second case, we are alerted to nothing as it has been for many years. And no, I don't want to have to add an ignore comment to every one of those lines. The tool should just ignore them either by default or via a global setting. |
I think mypy in the current config matches what you describe - it validates only things that are annotated. Which of the comment threads don't match this? There was an example where something clearly annotated as Although a big snag might be in the fact that the whole standard library is annotated, so usages (inside annotated functions) are checked against it as well. And there was one consistently mentioned limitation of mypy - that it can't cope with reassigning a different-typed value to the same variable name, and it should be able to. I have replied with "limitation" to those threads. But it's a conceptually different problem. And if we do look at it separately, perhaps this pull request can be said to have only this as a blocker. |
I've never suggested that that was the case. In fact, if the issue is that the annotation was wrong, I have made no objection. However, I have consistently stated that if the code is wrong, then each individual instance should be addressed as a separate bug, not included wholesale in this large PR. This PR (or one like it) should not be changing code. only annotations. And yet, you have repeatedly ignored that request.
And a very big blocker is it, I have stated repeatedly that I will not change this lib to be statically typed, which mypy requires. In other words, I will not change the code to appease a tool which is designed to enforce something we are not enforcing. And it is unreasonable to expect us to ignore-comment every single dynamically typed object in our code line-by-line. If mypy had a global option to not enforce that, then it wouldn't be a tool which enforced static types. Therefore, it seems that if does not fit our needs. |
4 significant checks are disabled:
strict_optional
- check that an object might beNone
on accesswarn_no_return
- requiring areturn
assignment
- any assignment of a variable of a wrong typevar-annotated
- a type annotation is missing