Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support calling methods that doesn't match a rule name #1076

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

plannigan
Copy link
Contributor

A working version that supports using a decorator to register a method to be called for a rule that does match the method name. It also supports things like a A inherits from B, which inherits from Transformer, and both A & B define overrides.

However, things get complicated with merge_transformers(). Since it will mutate base_transformer when given (explicitly documented behavior), copying over registered overrides needs to be done at the instance level instead of the class level. That in itself, is not a problem as an instance variable could be created in __init__(). The issue is that, nothing in the Transformer or Visitor inheritance chain currently defines an __init__(). This means that classes that do have an __init__(), likely don't call super().__init__() (example in test cases). This means the straightforward implementation would be a breaking change as super().__init__() would now be a requirement.

This is Python, so we can do a dynamic non-straightforward implementation, but I wanted to get some thought before I headed down that path.

Addresses #1066

@erezsh
Copy link
Member

erezsh commented Jan 9, 2022

This is a creative approach. But I think the code will be much simpler if the decorator merely tagged the method, and the tree-builder used the tagging when scanning the class.

Something like:

def call_for(rule_name: str) -> Callable:
    def _call_for(func: Callable) -> Callable:
        func.__rule_name__ = rule_name
        return func
    return _call_for

Then the visitors/transformers will only need to scan the instance and build a mapping.

Maybe we'll need a few extra lines to make sure it plays nice with v_args (i.e. the attribute doesn't get removed), not sure, but either way it feels much simpler.

@plannigan
Copy link
Contributor Author

Putting a marker attribute on the function could also work.

Then the visitors/transformers will only need to scan the instance and build a mapping.

I guess the key question is when should the scan happen to build the mapping? The answer to this is likely related to: What is the tolerance for how frequently does this happen? What I have now is a single scan each time a subclass is created, likely the best that can done as it never needs to be done again after that. The other extreme, would be to not maintain a mapping and scan the class every time. The middle ground dynamic option I was thinking about was lazily building the mapping the first time transform() or visit() are called. (Actually, putting some code at the start of the traversal would be an option catch cases where super().__init__() was not called and do a gradual deprecation instead of a direct breaking change.)

@erezsh
Copy link
Member

erezsh commented Jan 10, 2022

Well, I can't say for sure if it's better to do it during __init__ or __init_subclass, but having to scan the MRO for that is a bit of an over-complication. Python resolves the names for us, and everything we need is in the local namespace.

It might seem nitpicky but every line we add, is a line we need to maintain. Every behavior we choose, needs to play nice with future changes. So keeping it simple is really crucial for the long run.

(sure, putting it at the beginning of transform() might also be a good option.)

@erezsh
Copy link
Member

erezsh commented Jan 10, 2022

Btw, maybe we're over-engineering this. Maybe the only form of renaming we need to support is to add _, so we can do something like

try:
    return getattr(self, rule_name)
except AttributeError:
    return getattr(self, rule_name + '_')

And we memoize the lookup.

Maybe a little too magical, actually, but definitely simpler than the alternatives.

@MegaIng
Copy link
Member

MegaIng commented Jan 10, 2022

I would suggest allowing the user to define a method get_callback or something like that that allows a custom name resolution. This would allow the user to do what @erezsh proposed themselves. This function could also be cached in some form

We could also consider to amend that with a default search that tries to find methods having an attribute set by some call_for decorator, but IMO that is some tangential.

@erezsh
Copy link
Member

erezsh commented Jan 10, 2022

Since we're brainstorming here, why not just rstrip underscores from methods? Something like

@rstrip_underscores
class Foo(Transformer):
    def from_(self, args):   # renamed to from
        ...

    def not_a_keyword_(self, args):    # also renamed, because why not
        ...

@MegaIng
Copy link
Member

MegaIng commented Jan 10, 2022

My suggestion of providing a generic interface also has the benefit that it works even if the grammar rules are named very different than python identifiers, for example what might happen if you have ABNF compatible identifiers with hyphens instead of underscores.

@erezsh
Copy link
Member

erezsh commented Jan 10, 2022

Or we can rename hyphens to underscores and lose nothing in the process :)

@MegaIng
Copy link
Member

MegaIng commented Jan 10, 2022

and the spaces, and any other special characters that might appear in the name

@plannigan
Copy link
Contributor Author

My preference would be to avoid a name coercion scheme where certain characters are collapsed into underscores. It can lead to whack-a-mole situation where the implementation initially seems sufficient, but then a user comes with an unexpected edge case. Additionally, it can cause situations where the grammar rule names are unique, but the converted names collide.

In general, it seems like were attempting to figure out where we want to be in the triangle of:

  • easy for the framework to implement
  • nice API for the user to work with
  • low performance cost at runtime

get_callback() is a generic approach that would be a bit more explicit and would support caching, which is an improvement over users needing to implement everything in __default__(). Though, I think it does index a little too heavily on "easy for the framework to implement" at the cost of "nice API for the user to work with".

The decorator approach is also generic and likely the best option for "nice API for the user to work with". In general, it also scores well on "low performance cost at runtime", but some existing implementation decisions hurt that some as well as making "easy for the framework to implement" a bit worse. I can take another shot at using a marker instead of the wrapper object using a cache instead of identifying everything at class creation.

@erezsh
Copy link
Member

erezsh commented Jan 10, 2022

@plannigan I think that's a good analysis of the issue here. I'll just note that "nice API for the user to work with" is the main driver here, because otherwise everyone can do

setattr(my_transformer, 'from', from_)

And problem solved. So this is not really about adding new abilities, but about making it easy and idiomatic.

(okay, it won't solve dashes or dollar signs in the rule name, and for that get_callback() might be better. But it's still just a nice API for something the __default__() can already do)

@erezsh
Copy link
Member

erezsh commented Jan 10, 2022

Actually, doesn't __getattr__() solve the need for get_callback() ? I feel like we're being silly and forgetting the basics.

Use a marker on the function instead of a wrapper class
Look up overrides only when needed and cache result
@plannigan
Copy link
Contributor Author

plannigan commented Jan 18, 2022

Was able to get a complete solution working that:

  • Uses a marker attribute instead of a wrapping class (though that did introduce other complexities)
  • Has an internal cache for overrides that is lazily filled
  • The normal getattr() behavior happens first, so there is no performance change for matching on defined user functions. However, rules that don't have matching user functions will be slower.
  • Is backwards compatible for users that aren't calling super().__init__().

Open questions:

  • Since marker attributes don't work on methods, is that actually "simpler" than a wrapping class.
  • If we are adding a cache for overrides, do we want to cache everything? getattr() should always be fast, but rules without user functions currently pay a heavy cost every time. @lru_cache() would make this easy. Though there is a potential issue with merge_transformers() because base_transformer could have seen a prefixed rule that only now has a user function. ( An @lru_cache style cache makes it cache per class, not instance. Caching everything is still possible, just not easily tucked behind a decorator.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants