Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pytorch] continue to rewrite gen_python_functions.py with typed models #46978

Closed
wants to merge 8 commits into from

Conversation

ljk53
Copy link
Contributor

@ljk53 ljk53 commented Oct 28, 2020

Stack from ghstack:

Refactored and added type annotations to the most part of the file.

Some top-level codegen functions are called by other codegen scripts.
Will migrate them in subsequent PRs.

Differential Revision: D24589210

Refactored and added type annotations to the most part of the file.

Some top-level codegen functions are called by other codegen scripts.
Will migrate them in subsequent PRs.

[ghstack-poisoned]
… typed models"

Refactored and added type annotations to the most part of the file.

Some top-level codegen functions are called by other codegen scripts.
Will migrate them in subsequent PRs.

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Oct 28, 2020
Refactored and added type annotations to the most part of the file.

Some top-level codegen functions are called by other codegen scripts.
Will migrate them in subsequent PRs.

ghstack-source-id: 602db8dcd1a03ef69bb36dc3028ca10faffa5d59
Pull Request resolved: #46978
@ljk53 ljk53 requested review from bhosmer and ezyang October 28, 2020 09:14
@dr-ci
Copy link

dr-ci bot commented Oct 28, 2020

💊 CI failures summary and remediations

As of commit 402ffac (more details on the Dr. CI page):


None of the CI failures appear to be your fault 💚



🚧 1 ongoing upstream failure:

These were probably caused by upstream breakages that are not fixed yet:


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 20 times.

"""
if overload.outplace is not None:
# dispatch output and no-output variants, branch on _r.isNone(<out_idx>)
return PY_VARIABLE_OUT.substitute(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the new codegen is moving away from CodeTemplate - but the auto-indent feature is useful in this case as the emit_single_dispatch can have different indents: inside switch-case, inside if-else, inside both or none.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is troublesome. For now, I have have been opting for code gen readability (f-strings) at the expense of generated code readability (yolo indentation) but generated code readability is getting worse and worse. I'm kind of pissed that Python doesn't have some sort of pluggable f-string mechanism for this case.

I can only really think of a few options for this problem:

  1. Run some sort of formatter (clang-format, maybe?) on the generated code after we generate it. Sounds slow.
  2. Manually annotate multiline string formats with an indent level that triggers indentation, e.g.,
void f() {{
    {multiline(thing, 4)}
}}
  1. Stop using f-strings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now I'll slow down converting multiline CodeTemplate to f-string for my future codegen rewrite PRs.

… typed models"

Refactored and added type annotations to the most part of the file.

Some top-level codegen functions are called by other codegen scripts.
Will migrate them in subsequent PRs.

Differential Revision: [D24589210](https://our.internmc.facebook.com/intern/diff/D24589210)

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Oct 28, 2020
Refactored and added type annotations to the most part of the file.

Some top-level codegen functions are called by other codegen scripts.
Will migrate them in subsequent PRs.

ghstack-source-id: 6d2c50a599534000e784050c1fb7416baf7c177a
Pull Request resolved: #46978
… typed models"

Refactored and added type annotations to the most part of the file.

Some top-level codegen functions are called by other codegen scripts.
Will migrate them in subsequent PRs.

Differential Revision: [D24589210](https://our.internmc.facebook.com/intern/diff/D24589210)

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Oct 28, 2020
Refactored and added type annotations to the most part of the file.

Some top-level codegen functions are called by other codegen scripts.
Will migrate them in subsequent PRs.

ghstack-source-id: fbd1b53f795e79882fae3a77bb588defa464d081
Pull Request resolved: #46978
@ezyang
Copy link
Contributor

ezyang commented Oct 29, 2020

What's the general approach I should take for reviewing this PR?

… typed models"

Refactored and added type annotations to the most part of the file.

Some top-level codegen functions are called by other codegen scripts.
Will migrate them in subsequent PRs.

Differential Revision: [D24589210](https://our.internmc.facebook.com/intern/diff/D24589210)

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Oct 31, 2020
Refactored and added type annotations to the most part of the file.

Some top-level codegen functions are called by other codegen scripts.
Will migrate them in subsequent PRs.

ghstack-source-id: 89625535a931b6ff5f3390fb26d916cf4d8f9268
Pull Request resolved: #46978
ljk53 added a commit to ljk53/pytorch that referenced this pull request Nov 5, 2020
Refactored and added type annotations to the most part of the file.

Some top-level codegen functions are called by other codegen scripts.
Will migrate them in subsequent PRs.

ghstack-source-id: 89625535a931b6ff5f3390fb26d916cf4d8f9268
Pull Request resolved: pytorch#46978
… typed models"

Refactored and added type annotations to the most part of the file.

Some top-level codegen functions are called by other codegen scripts.
Will migrate them in subsequent PRs.

Differential Revision: [D24589210](https://our.internmc.facebook.com/intern/diff/D24589210)

[ghstack-poisoned]
for sig, out in outplaces.items():
if sig not in bases:
candidates: List[str] = []
for overload in overloads:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merged with #47087 and tested by hacking the native_functions.yaml:

RuntimeError: While identifying overloads, we found an out schema addmv(Tensor input, Tensor mat, Tensor vec, *, Scalar beta=1, Scalar alpha=1, Tensor out=None) without a corresponding non-out variant. We expected the non-out variant to have schema: 
- addmv(Tensor input, Tensor mat, Tensor vec, *, Scalar beta=1, Scalar alpha=1)
Please check that you spelled the schema correctly in native_functions.yaml. We discovered the following candidate(s): 
- addmv(Tensor input, Tensor mat, Tensor vec, *, Scalar beta=1)

@ezyang
Copy link
Contributor

ezyang commented Nov 5, 2020

Last week, @ljk53 and @bhosmer and I had a brief discussion about how to go about reviewing this PR. Big take away: Phabricator does a better job of generating diff than GitHub, so go look at the diff there (it identifies code motion).

# TODO: should use some canonical form instead of 'str(arg.type)' - see comments
# above. The old codegen used the deprecated 'dynamic_type(arg.type)', which
# ignores the optional annotation, i.e. 'Scalar' and 'Scalar?'.
any_unequal = any(str(arg1.type) != str(arg2.type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know you can just test these for equality right? So no need to stringify them first. (Stringifying is poor man's pattern matching.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, you can't you literally just do args1 != args2 because you've already checked that the lengths are the same above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you can't, because you only want to check the types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know you can just test these for equality right? So no need to stringify them first. (Stringifying is poor man's pattern matching.)

oh, I thought it has to implement __eq__ explicitly... thanks for pointing out!

for arg1, arg2 in zip(args1, args2))
return any_smaller and all_smaller_or_equal
return any_unequal and all_smaller_or_equal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is the fault of the original code, but I just want to remark: the use of "any" and "all" here obscures the essential logical property going on here: if you know how to compute lte and eq, you can compute lt = lte && !eq. The swapping between any/all is the result of pushing the negation inside of the all quantifier, but the quantifier doesn't really have anything to do with the logical computation going on here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, (once again, artifact of the old code) I'm a little perplexed as to why equality matters here, because I don't feel like there really should be any overloads in the list with exactly the same types, so given two distinct signatures they should always be unequal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed 'any' and 'all'

why equality check? I think the nested loops below can compare one to itself (which can be modified, of course)

format(declaration['name']))
return name
del larger_than[j]
sorted_ids.append(j)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the old code's fault, but mutating lists as I iterate over them always gives me the jeebies

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the old code's fault, but mutating lists as I iterate over them always gives me the jeebies

ok - I've simply changed it to iterating through index indirectly instead of sorted_ids directly - not sure it hurts readability :-)

"""
grouped = defaultdict(dict)
def group_overloads(
overloads: Tuple[PythonSignatureNativeFunctionPair, ...]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this is probably OK just being a Sequence

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, changed all Tuple[PythonSignatureNativeFunctionPair, ...] to Sequence[PythonSignatureNativeFunctionPair]

for overload in overloads:
sig = overload.signature.signature_str(skip_outputs=True)
if overload.function.func.is_out_fn():
outplaces[sig] = overload
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert sig not in outplaces? Similar with bases below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done:

RuntimeError: Found duplicated function definition:
- addmv.out2(Tensor self, Tensor mat, Tensor vec, *, Scalar beta=1, Scalar alpha=1, Tensor(a!) out, Tensor(b!) out2) -> (Tensor(a!), Tensor(b!)).
Existing definition:
- addmv.out(Tensor self, Tensor mat, Tensor vec, *, Scalar beta=1, Scalar alpha=1, Tensor(a!) out) -> Tensor(a!).
RuntimeError: Found duplicated function definition:
- addmv.dup(Tensor self, Tensor mat, Tensor vec, *, Scalar beta=1, Scalar alpha=2) -> Tensor.
Existing definition:
- addmv(Tensor self, Tensor mat, Tensor vec, *, Scalar beta=1, Scalar alpha=1) -> Tensor.

for sig, base in bases.items():
outplace = outplaces.get(sig)
grouped.append(PythonSignatureGroup(
# prefer the signature with optional out=... arguments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing to do here, but I hate this. Will be kind of hard to fix though.

The opposite invariant (use the signature with functional arguments) would be easier to get to; I doubt there are many out= functions that don't have a functional version. Something that isn't so clear just looking at the comment here or in PythonSignatureGroup comments is why we need to prefer the out= variant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved the comments at both places: because the one with the optional out arguments is the superset that can be used to parse input for both base and outplace.

def_template = PY_VARIABLE_METHOD_BINOP_DEF
# PyMethodDef entry for binary op, throws not implemented error
return f"""\
{{"{name}", {pyfunc_cast}(TypeError_to_NotImplemented_<{pycname}>), {flags}, NULL}},"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for inlining these, I appreciate it.

py_forwards.extend(forward_decls(name, overload_decls, is_python_method, module))
overloads = tuple(decl_to_signature_function_pair(decl, method=is_python_method)
for decl in python_functions[name])
noarg = len(overloads) == 1 and overloads[0].signature.arguments_count() == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: wondering what a noarg is and why we have to compute here, rather than inside the functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no particular reason - just that this way we can avoid passing 'overloads' to 'method_def' / 'forward_decls' because this bit is what all they need.
I've changed it back to passing overloads instead to make it more uniform.


@with_native_function
def gen_namedtuple_typename_key(f: NativeFunction) -> str:
name = cpp.name(f.func)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpp.name? Really?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is probably an internal name, judging from the algo below


def emit_namedtuple_typedefs(
overloads: Tuple[PythonSignatureNativeFunctionPair, ...]
) -> Tuple[List[str], Dict[str, str]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting pretty bad str blindness here

for decl in declarations:
fieldnames = namedtuple_fieldnames(decl)
if fieldnames == []:
decl['namedtuple_typeref'] = ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GOOD RIDDANCE, ha!

typename = typenames.get(key)
name = cpp.name(overload.function.func) # use @with_native_function?
tn_key = gen_namedtuple_typename_key(overload.function)
typename = typenames.get(tn_key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make me feel better if there was some asserts here that would tell me that order of overloads doesn't matter for these initializations but I can't think of an easy way to actually do this.

method_footer = ['END_HANDLE_TH_ERRORS']
# NOTE: we type the unpacked self as Tensor not Variable to avoid return type
# discrepancies on method resolution (e.g. Variable::detach_ returns void
# rather than Tensor &)
Copy link
Contributor

@ezyang ezyang Nov 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, this comment is defunct now, because Tensor and Variable are the same in C++ now. (No action on this PR, I'm pretty sure; this would be a codegen output change).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I deleted this comment - not worth calling out anymore?

if(check_has_torch_function(self_)) {{
return handle_torch_function(self_, "{name}");
}}
""" if method else ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the if method has the same indentation as if noarg so I nearly thought it was acutally on the next line haha. Might be easier to read if we just have another nested if block inside

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ljk53 added 2 commits November 6, 2020 16:35
… typed models"

Refactored and added type annotations to the most part of the file.

Some top-level codegen functions are called by other codegen scripts.
Will migrate them in subsequent PRs.

Differential Revision: [D24589210](https://our.internmc.facebook.com/intern/diff/D24589210)

[ghstack-poisoned]
… typed models"

Refactored and added type annotations to the most part of the file.

Some top-level codegen functions are called by other codegen scripts.
Will migrate them in subsequent PRs.

Differential Revision: [D24589210](https://our.internmc.facebook.com/intern/diff/D24589210)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

@ljk53 merged this pull request in 16c72a5.

@facebook-github-bot facebook-github-bot deleted the gh/ljk53/183/head branch November 11, 2020 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants