-
Notifications
You must be signed in to change notification settings - Fork 23.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pytorch] continue to rewrite gen_python_functions.py with typed models #46978
Conversation
Refactored and added type annotations to the most part of the file. Some top-level codegen functions are called by other codegen scripts. Will migrate them in subsequent PRs. [ghstack-poisoned]
… typed models" Refactored and added type annotations to the most part of the file. Some top-level codegen functions are called by other codegen scripts. Will migrate them in subsequent PRs. [ghstack-poisoned]
Refactored and added type annotations to the most part of the file. Some top-level codegen functions are called by other codegen scripts. Will migrate them in subsequent PRs. ghstack-source-id: 602db8dcd1a03ef69bb36dc3028ca10faffa5d59 Pull Request resolved: #46978
💊 CI failures summary and remediationsAs of commit 402ffac (more details on the Dr. CI page): ✅ None of the CI failures appear to be your fault 💚
🚧 1 ongoing upstream failure:These were probably caused by upstream breakages that are not fixed yet: This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 20 times. |
""" | ||
if overload.outplace is not None: | ||
# dispatch output and no-output variants, branch on _r.isNone(<out_idx>) | ||
return PY_VARIABLE_OUT.substitute( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that the new codegen is moving away from CodeTemplate - but the auto-indent feature is useful in this case as the emit_single_dispatch can have different indents: inside switch-case, inside if-else, inside both or none.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is troublesome. For now, I have have been opting for code gen readability (f-strings) at the expense of generated code readability (yolo indentation) but generated code readability is getting worse and worse. I'm kind of pissed that Python doesn't have some sort of pluggable f-string mechanism for this case.
I can only really think of a few options for this problem:
- Run some sort of formatter (clang-format, maybe?) on the generated code after we generate it. Sounds slow.
- Manually annotate multiline string formats with an indent level that triggers indentation, e.g.,
void f() {{
{multiline(thing, 4)}
}}
- Stop using f-strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now I'll slow down converting multiline CodeTemplate to f-string for my future codegen rewrite PRs.
… typed models" Refactored and added type annotations to the most part of the file. Some top-level codegen functions are called by other codegen scripts. Will migrate them in subsequent PRs. Differential Revision: [D24589210](https://our.internmc.facebook.com/intern/diff/D24589210) [ghstack-poisoned]
Refactored and added type annotations to the most part of the file. Some top-level codegen functions are called by other codegen scripts. Will migrate them in subsequent PRs. ghstack-source-id: 6d2c50a599534000e784050c1fb7416baf7c177a Pull Request resolved: #46978
… typed models" Refactored and added type annotations to the most part of the file. Some top-level codegen functions are called by other codegen scripts. Will migrate them in subsequent PRs. Differential Revision: [D24589210](https://our.internmc.facebook.com/intern/diff/D24589210) [ghstack-poisoned]
Refactored and added type annotations to the most part of the file. Some top-level codegen functions are called by other codegen scripts. Will migrate them in subsequent PRs. ghstack-source-id: fbd1b53f795e79882fae3a77bb588defa464d081 Pull Request resolved: #46978
What's the general approach I should take for reviewing this PR? |
… typed models" Refactored and added type annotations to the most part of the file. Some top-level codegen functions are called by other codegen scripts. Will migrate them in subsequent PRs. Differential Revision: [D24589210](https://our.internmc.facebook.com/intern/diff/D24589210) [ghstack-poisoned]
Refactored and added type annotations to the most part of the file. Some top-level codegen functions are called by other codegen scripts. Will migrate them in subsequent PRs. ghstack-source-id: 89625535a931b6ff5f3390fb26d916cf4d8f9268 Pull Request resolved: #46978
Refactored and added type annotations to the most part of the file. Some top-level codegen functions are called by other codegen scripts. Will migrate them in subsequent PRs. ghstack-source-id: 89625535a931b6ff5f3390fb26d916cf4d8f9268 Pull Request resolved: pytorch#46978
… typed models" Refactored and added type annotations to the most part of the file. Some top-level codegen functions are called by other codegen scripts. Will migrate them in subsequent PRs. Differential Revision: [D24589210](https://our.internmc.facebook.com/intern/diff/D24589210) [ghstack-poisoned]
for sig, out in outplaces.items(): | ||
if sig not in bases: | ||
candidates: List[str] = [] | ||
for overload in overloads: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merged with #47087 and tested by hacking the native_functions.yaml:
RuntimeError: While identifying overloads, we found an out schema addmv(Tensor input, Tensor mat, Tensor vec, *, Scalar beta=1, Scalar alpha=1, Tensor out=None) without a corresponding non-out variant. We expected the non-out variant to have schema:
- addmv(Tensor input, Tensor mat, Tensor vec, *, Scalar beta=1, Scalar alpha=1)
Please check that you spelled the schema correctly in native_functions.yaml. We discovered the following candidate(s):
- addmv(Tensor input, Tensor mat, Tensor vec, *, Scalar beta=1)
# TODO: should use some canonical form instead of 'str(arg.type)' - see comments | ||
# above. The old codegen used the deprecated 'dynamic_type(arg.type)', which | ||
# ignores the optional annotation, i.e. 'Scalar' and 'Scalar?'. | ||
any_unequal = any(str(arg1.type) != str(arg2.type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know you can just test these for equality right? So no need to stringify them first. (Stringifying is poor man's pattern matching.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, you can't you literally just do args1 != args2
because you've already checked that the lengths are the same above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you can't, because you only want to check the types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know you can just test these for equality right? So no need to stringify them first. (Stringifying is poor man's pattern matching.)
oh, I thought it has to implement __eq__
explicitly... thanks for pointing out!
for arg1, arg2 in zip(args1, args2)) | ||
return any_smaller and all_smaller_or_equal | ||
return any_unequal and all_smaller_or_equal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this is the fault of the original code, but I just want to remark: the use of "any" and "all" here obscures the essential logical property going on here: if you know how to compute lte
and eq
, you can compute lt = lte && !eq
. The swapping between any/all is the result of pushing the negation inside of the all quantifier, but the quantifier doesn't really have anything to do with the logical computation going on here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, (once again, artifact of the old code) I'm a little perplexed as to why equality matters here, because I don't feel like there really should be any overloads in the list with exactly the same types, so given two distinct signatures they should always be unequal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed 'any' and 'all'
why equality check? I think the nested loops below can compare one to itself (which can be modified, of course)
format(declaration['name'])) | ||
return name | ||
del larger_than[j] | ||
sorted_ids.append(j) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the old code's fault, but mutating lists as I iterate over them always gives me the jeebies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the old code's fault, but mutating lists as I iterate over them always gives me the jeebies
ok - I've simply changed it to iterating through index indirectly instead of sorted_ids directly - not sure it hurts readability :-)
""" | ||
grouped = defaultdict(dict) | ||
def group_overloads( | ||
overloads: Tuple[PythonSignatureNativeFunctionPair, ...] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this is probably OK just being a Sequence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, changed all Tuple[PythonSignatureNativeFunctionPair, ...]
to Sequence[PythonSignatureNativeFunctionPair]
for overload in overloads: | ||
sig = overload.signature.signature_str(skip_outputs=True) | ||
if overload.function.func.is_out_fn(): | ||
outplaces[sig] = overload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert sig not in outplaces
? Similar with bases
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done:
RuntimeError: Found duplicated function definition:
- addmv.out2(Tensor self, Tensor mat, Tensor vec, *, Scalar beta=1, Scalar alpha=1, Tensor(a!) out, Tensor(b!) out2) -> (Tensor(a!), Tensor(b!)).
Existing definition:
- addmv.out(Tensor self, Tensor mat, Tensor vec, *, Scalar beta=1, Scalar alpha=1, Tensor(a!) out) -> Tensor(a!).
RuntimeError: Found duplicated function definition:
- addmv.dup(Tensor self, Tensor mat, Tensor vec, *, Scalar beta=1, Scalar alpha=2) -> Tensor.
Existing definition:
- addmv(Tensor self, Tensor mat, Tensor vec, *, Scalar beta=1, Scalar alpha=1) -> Tensor.
for sig, base in bases.items(): | ||
outplace = outplaces.get(sig) | ||
grouped.append(PythonSignatureGroup( | ||
# prefer the signature with optional out=... arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing to do here, but I hate this. Will be kind of hard to fix though.
The opposite invariant (use the signature with functional arguments) would be easier to get to; I doubt there are many out= functions that don't have a functional version. Something that isn't so clear just looking at the comment here or in PythonSignatureGroup comments is why we need to prefer the out= variant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improved the comments at both places: because the one with the optional out arguments is the superset that can be used to parse input for both base and outplace.
def_template = PY_VARIABLE_METHOD_BINOP_DEF | ||
# PyMethodDef entry for binary op, throws not implemented error | ||
return f"""\ | ||
{{"{name}", {pyfunc_cast}(TypeError_to_NotImplemented_<{pycname}>), {flags}, NULL}},""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for inlining these, I appreciate it.
py_forwards.extend(forward_decls(name, overload_decls, is_python_method, module)) | ||
overloads = tuple(decl_to_signature_function_pair(decl, method=is_python_method) | ||
for decl in python_functions[name]) | ||
noarg = len(overloads) == 1 and overloads[0].signature.arguments_count() == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: wondering what a noarg is and why we have to compute here, rather than inside the functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no particular reason - just that this way we can avoid passing 'overloads' to 'method_def' / 'forward_decls' because this bit is what all they need.
I've changed it back to passing overloads instead to make it more uniform.
|
||
@with_native_function | ||
def gen_namedtuple_typename_key(f: NativeFunction) -> str: | ||
name = cpp.name(f.func) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cpp.name? Really?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this is probably an internal name, judging from the algo below
|
||
def emit_namedtuple_typedefs( | ||
overloads: Tuple[PythonSignatureNativeFunctionPair, ...] | ||
) -> Tuple[List[str], Dict[str, str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting pretty bad str blindness here
for decl in declarations: | ||
fieldnames = namedtuple_fieldnames(decl) | ||
if fieldnames == []: | ||
decl['namedtuple_typeref'] = '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GOOD RIDDANCE, ha!
typename = typenames.get(key) | ||
name = cpp.name(overload.function.func) # use @with_native_function? | ||
tn_key = gen_namedtuple_typename_key(overload.function) | ||
typename = typenames.get(tn_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make me feel better if there was some asserts here that would tell me that order of overloads doesn't matter for these initializations but I can't think of an easy way to actually do this.
method_footer = ['END_HANDLE_TH_ERRORS'] | ||
# NOTE: we type the unpacked self as Tensor not Variable to avoid return type | ||
# discrepancies on method resolution (e.g. Variable::detach_ returns void | ||
# rather than Tensor &) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, this comment is defunct now, because Tensor and Variable are the same in C++ now. (No action on this PR, I'm pretty sure; this would be a codegen output change).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I deleted this comment - not worth calling out anymore?
if(check_has_torch_function(self_)) {{ | ||
return handle_torch_function(self_, "{name}"); | ||
}} | ||
""" if method else '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the if method
has the same indentation as if noarg
so I nearly thought it was acutally on the next line haha. Might be easier to read if we just have another nested if block inside
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
… typed models" Refactored and added type annotations to the most part of the file. Some top-level codegen functions are called by other codegen scripts. Will migrate them in subsequent PRs. Differential Revision: [D24589210](https://our.internmc.facebook.com/intern/diff/D24589210) [ghstack-poisoned]
… typed models" Refactored and added type annotations to the most part of the file. Some top-level codegen functions are called by other codegen scripts. Will migrate them in subsequent PRs. Differential Revision: [D24589210](https://our.internmc.facebook.com/intern/diff/D24589210) [ghstack-poisoned]
Stack from ghstack:
Refactored and added type annotations to the most part of the file.
Some top-level codegen functions are called by other codegen scripts.
Will migrate them in subsequent PRs.
Differential Revision: D24589210