Skip to content

Make instance names optional by generating them based on types #4085

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 59 commits into from
May 24, 2021
Merged

Make instance names optional by generating them based on types #4085

merged 59 commits into from
May 24, 2021

Conversation

JordanMartinez
Copy link
Contributor

@JordanMartinez JordanMartinez commented May 6, 2021

Description of the change

Fixes #4084.

The biggest question here is how the name generator should work.

  • uniqueness - My current "just print the types" approach likely works for 80% of all cases, but any complicated ones using lots of parameters and nesting might clash with another instance that just so happens to produce the same name. I'm wondering whether using a generated name (e.g. GenIdent) would be a simpler and better approach
  • brevity - by accounting for all types, the name will be unique. However, it might be very long as a result. Consider what should be generated when nested records are used.
  • FFI-friendly? - I'm not sure if there's a concern that instances be callable from FFI. If there is, then I assume that one could workaround this problem by defining a name for an instance by hand (i.e. what we currently do for al instances).

The next question is how this code should be tested. I could take all current instance tests and add a variation of each file's code to see if things still work/fail when the name :: part is removed from the instance. Since the code would otherwise be the same AST, I'm not sure that's necessary. A better approach would be testing whether names are unique and short enough.


Checklist:

  • Added the change to the changelog's "Unreleased" section with a reference to this PR (e.g. "- Made a change (#0000)")
  • Linked any existing issues or proposals that this pull request should close
  • Updated or added relevant documentation
  • Added a test for the contribution (if applicable)

@rhendric
Copy link
Member

rhendric commented May 6, 2021

I'd be in favor of opaque generated names ($1). If you omit the instance name, you shouldn't rely on what the compiler chooses for an instance name. So the compiler shouldn't export a name that you might think you could rely on. All that and it would be simpler.

@hdgarrood
Copy link
Contributor

I agree that we should make it hard to end up relying on what the compiler chooses for an instance name, but having the scheme be just a $ followed by an incrementing integer would hurt the readability of the generated code quite a bit, and reading the generated code is useful from time to time. What if, as a compromise, we smush the type names together and then append a number? That way hopefully readability is still preserved, but the number at the end is still going to make it difficult to rely on.

@hdgarrood
Copy link
Contributor

As for FFI-friendliness, I don't think that's a concern; we are gradually pushing users away from calling functions with constraints from JavaScript. For example, #3182 was a big step in this direction.

@JordanMartinez
Copy link
Contributor Author

JordanMartinez commented May 7, 2021

What if, as a compromise, we smush the type names together and then append a number? That way hopefully readability is still preserved, but the number at the end is still going to make it difficult to rely on.

Sounds good to me. Could you help guide me on how/where this number should be generated?

@rhendric
Copy link
Member

rhendric commented May 7, 2021

How about className$1, as opposed to classNameArgArgArg$1? I'm sympathetic about readability but some of those smushed names could get so long as to hinder readability in a different way (think monadStateStateHalogenMStateActionSlotsOutputM$1).

Edit: or something like truncating at 25-ish characters would work for me too.

@hdgarrood
Copy link
Contributor

I think I prefer truncating rather than only including the class name too. It's been too long since I've been in this code and so I don't really remember, but I'd recommend following the pattern used in other places where we generate names (e.g. for unknown types when type checking). I think using GenIdent together with MonadSupply for generating the integers should work.

@hdgarrood
Copy link
Contributor

I think that if you generate the names in a desugaring step rather than in the CST->AST conversion, it might be simpler and it will match better with the existing patterns in the compiler. Of course it's not really ideal that doing it in a desugaring step means that the AST type will have to use a Maybe for the name as well, but we intend to get rid of explicit instance names eventually anyway (at least I think we do?) and so since we'd be able to get rid of that Maybe then, having to add the occasional fromMaybe (internalError ...) when accessing an instance name that we know exists doesn't seem too bad to me in the meantime.

@JordanMartinez
Copy link
Contributor Author

I'll ask a dumb question. Why not ignore explicit names immediately and generate names using the truncation idea above? Is it because this would be a breaking change in case someone (who knows where) is using an instance in FFI? Or is there something else going on here I don't know of?

@JordanMartinez
Copy link
Contributor Author

I think that if you generate the names in a desugaring step rather than in the CST->AST conversion, it might be simpler and it will match better with the existing patterns in the compiler.

Yeah, that's partly why I was originally asking "where" to generate the number. convertDeclaration runs in the List monad, so I don't have access to freshIdent as a side effect. Also, is it not possible to generate the number when the parser is running? Or does that incur other problems?

@hdgarrood
Copy link
Contributor

Not a dumb question! I was thinking I’d like to continue using the declared instance names where they exist for as long as we are going to accept them, and that is indeed because of FFI - specifically because there may be cases where people are calling PureScript functions with constraints from the target language (eg JS). The compiler has no way of knowing when this is happening, so I think it will be worth being slightly more cautious. Also, I think it’s a little harder to justify going straight to ignoring the names with no advance notice because we haven’t (yet) explicitly come out and said anywhere that calling functions with constraints from outside PureScript is unsupported and going to become impossible as far as I know.

@kl0tl
Copy link
Member

kl0tl commented May 7, 2021

While working on this last summer I ended up with the following scheme for generating unique but still readable instance names, following the usual convention:

  • a leading dollar,
  • then the class name,
  • then two dollars,
  • then the instance arguments which are not determined by any functional dependency and are either nullary type constructors, application chains headed by a type constructor, or symbols, prefixed by $sym to distinguish them from type constructors, separated by two dollars.

Arguments in application chains are separated by a single dollar and are printed according to the scheme for instances arguments.

We don’t need to consider rows, variables nor determined arguments outside of instance chains because of overlaps, but we can avoid collisions easily enough by appending the instance position in the chain.

There’s a slight hurdle though: this scheme needs access to both the names written by the programmer (well I guess that’s not strictly required but fully qualified names do not help with readability), which are only available before renaming, and functional dependencies, which are only available after desugaring type classes 🙃

@JordanMartinez
Copy link
Contributor Author

There’s a slight hurdle though: this scheme needs access to both the names written by the programmer (well I guess that’s not strictly required but fully qualified names do not help with readability), which are only available before renaming, and functional dependencies, which are only available after desugaring type classes

Does that mean there is preliminary work that needs to be done before this approach can be taken? Or that a different approach should be taken?

@JordanMartinez
Copy link
Contributor Author

There’s a slight hurdle though: this scheme needs access to both the names written by the programmer (well I guess that’s not strictly required but fully qualified names do not help with readability), which are only available before renaming, and functional dependencies, which are only available after desugaring type classes 🙃

Sounds like this is better explained in #3426 (comment). In other words, it sounds like there is some preliminary work that needs to be done before multiple things can be unblocked.

@rhendric
Copy link
Member

rhendric commented May 8, 2021

#4086 only blocks the smart name munging approach to this, not the quick-and-dirty truncated-string-plus-freshIdent approach. There are a couple of reasons to prefer the latter: it's likely simpler than what the implementation of #4086 would be, and I think it's more honest—this change does seem like a revision of the AST, not just the concrete surface syntax, and translating unnamed instances to uniquely-named instances does seem like a desugaring step.

@JordanMartinez
Copy link
Contributor Author

JordanMartinez commented May 10, 2021

Thanks for clarifying. I'll keep working on this.

One other question. Looking at the lines for converting instance chains from CST to AST, it seems that all instance names within an instance chain is stored as a list in each AST.TypeInstanceDeclaration value. For example:

instance foo :: ClassA Foo
else instance bar :: ClassA Bar
else instance baz :: ClassA Baz

will be stored as three separate AST.TypeInstanceDeclaration where the chainId value stores the entire list as ["foo", "bar", "baz"]:

AST.TypeInstanceDeclaration ann ["foo", "bar", "baz"] 0 "foo" deps className tys body
AST.TypeInstanceDeclaration ann ["foo", "bar", "baz"] 1 "bar" deps className tys body
AST.TypeInstanceDeclaration ann ["foo", "bar", "baz"] 2 "baz" deps className tys body

By removing the instance names, the source code would be something like this...

instance ClassA Foo
else instance ClassA Bar
else instance ClassA Baz

and the resulting AST.TypeInstanceDeclaration would be this:

-- foo
AST.TypeInstanceDeclaration ann [Nothing, Nothing, Nothing] 0 Nothing deps className tys body
-- bar
AST.TypeInstanceDeclaration ann [Nothing, Nothing, Nothing] 1 Nothing deps className tys body
-- baz
AST.TypeInstanceDeclaration ann [Nothing, Nothing, Nothing] 2 Nothing deps className tys body

Using what would be baz as an example, I believe I can generate an instance name for baz's instance since I still have the className and tys values available. Thus, the AST would be updated to be something like this:

-- baz
AST.TypeInstanceDeclaration ann [Nothing, Nothing, Just "$$ClassName$Baz$2"] 2 "$$ClassName$Baz$2" deps className tys body

But how would I update the other two Nothings (i.e. foo and bar's names)? While the className value should remain the same in an instance chain, I do not know what foo and bar's tys arguments are.

@rhendric
Copy link
Member

That chain field always struck me as a little bit broken, actually. I don't think it has to be a list of names; it just has to be some kind of unique identifier. I don't know why it couldn't be a (ModuleName, SourcePos), or anything else that's convenient to make unique.

@JordanMartinez
Copy link
Contributor Author

How is chainId even used? As a key in some Map?

@rhendric
Copy link
Member

I know it's used in Language/PureScript/TypeChecker.hs as part of checkOverlappingInstance. I think but am not 100% confident that this is its one and only purpose.

@JordanMartinez
Copy link
Contributor Author

Hm... Maybe I should spend more time reading through this repo's source code and just getting a deeper understanding of things before I continue contributing. Perhaps documenting things along the way might help, too.

@JordanMartinez
Copy link
Contributor Author

I've run out of time and will have to come back to this PR in a few weeks. This PR was to see how far I could get in a week or so because I got bored of fixing v0.14.1 PS warnings on contrib libraries. When I return to this, I'd like to read through more of the codebase and continue from there.

@JordanMartinez
Copy link
Contributor Author

Took a quick break from other work to see where chainId is used. Here's where it's used:

The chain is converted to a chain with qualified names, then in checkOverlappingInstances the list is used only once to prevent an OverlappingInstance error from being thrown. Also, the tcdChain function comes from the TypeClassDictionary type

Essentially, if we can accomplish that equality check with something functionally equivalent, then it unblocks the next part of this PR.

@JordanMartinez
Copy link
Contributor Author

Latest update on this PR.

After getting a better understanding of the general flow of this code, I took the following approach:

  • I changed the name arg of AST's TypeInstanceDeclaration from Ident to Either Text Ident. Left t corresponds to the partially generated (i.e. no unique identifier added) name of the instance as computed in convertDecl. Right i corresponds to the unique name defined in source code. I also changed it's chainId from [Ident] to [Either Text Ident]. It ultimately works the same way it does before this PR is introduced (further described below)
  • I added another pass to the desugar code that will complete the generated instance names via fresh. During this pass, I also return the mapping (i.e. instMap) from the partially-completed instance name to the fully generated instance name.
  • In the desugarTypeClasses pass, all Left t are remapped to their final names via the mapping created in the previous pass. The final and correct instance names with fully generated values are then passed into the rest of the compiler's pipeline.

A few questions I have:

  • should all types in convertDecl be mapped to a text value? Right now I only have some of them based on the typeAtom expression, and I'm curious if TypeParens might reveal one of the other ones not currently covered.
  • will the instance name remapping approach I've taken with instMap ever produce an error because two instances in a chain produce the same partially-completed instance name before the unique identifier is appended to it?
  • how should this PR be tested?

@JordanMartinez
Copy link
Contributor Author

Regardless, this PR is ready for feedback.

GenIdent causes an error in properNameToJSON
@JordanMartinez
Copy link
Contributor Author

Latest commit changes this...

$dollarModule_ClassName$dollarArray$dollarString$dollarBo$dollar42

to this

$dollarModule_ClassName_Array_String_Bo$dollar42

I didn't change the left-most or right-most dollars as it's not yet clear from other core members whether that should be done. Changing the middle $ chars to _ seemed reasonable to make without additional feedback.

Copy link
Member

@kl0tl kl0tl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments on the instances names generation, nothing too important though. Also, do we even need to have separators since we don’t rely on them to disambiguate similar instances (for example both C (Foo Bar) Baz and C Foo (Bar Baz) yield $C_Foo_Bar_Baz, to which we then append an unique suffix)? Is $dollarModule_ClassName_Array_S$dollar42 really more legible than $dollarModuleClassNameArrayStr$dollar42?

argName t1 <> "$" <> (N.runOpName $ qualName op) <> "$" <> argName t2
TypeArr _ t1 _ t2 -> argName t1 <> "_Arrow_" <> argName t2
TypeConstrained{} -> ""
TypeUnaryRow{} -> "EmptyRow"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Row Type is rendered as Row_Type, so I think # Type should render to the same string:

Suggested change
TypeUnaryRow{} -> "EmptyRow"
TypeUnaryRow{} -> "Row"

@hdgarrood
Copy link
Contributor

@rhendric for what it's worth, I definitely consider you to be a real maintainer. If you don't feel that way, I think that's more of an indication that we haven't quite sorted out our governance model just yet than anything else.

@JordanMartinez
Copy link
Contributor Author

JordanMartinez commented May 21, 2021

Is $dollarModule_ClassName_Array_S$dollar42 really more legible than $dollarModuleClassNameArrayStr$dollar42?

A few thoughts on this:

  • Following current naming convention, we don't separate names with '_' or '$'. This was influenced by kl0tl's original approach in a related work. So, I agree that separators should be dropped altogether.
  • Following current naming convention, we don't add a module name to the instance name. Adding the module name was necessary when I was using my original approach for the chain id issue. Having switched to rhendric's filename and source pos idea, it's no longer necessary. So, the module should be dropped from the instance name.
  • Since readability is a concern, I think adding a '_' character after the first '$dollar' would make the generated instance name easier to read (e.g. $dollarClassNameArrayS$dollar42 vs $dollar_ClassNameArrayS$dollar42).
  • I feel like the initial $dollar is good and necessary to prevent name clashing, but that the final $dollar separating the name from the unique identifier is not necessary. For readability concerns, perhaps adding two '_' characters would be better in case one defines a type that ends with an underscore?

@JordanMartinez
Copy link
Contributor Author

Any other changes on this? Or am I good to squash and merge?

@JordanMartinez JordanMartinez merged commit 2daf461 into purescript:master May 24, 2021
@JordanMartinez JordanMartinez deleted the optionalInstanceNames branch May 24, 2021 14:15
@JordanMartinez
Copy link
Contributor Author

I was thinking about this PR just a few minutes ago and I realized that we can likely merge the desugarTypeClassInstanceNames pass into the desugarTypeClasses pass. These two passes originally needed to be separate passes because of how my original approach worked.

In my original approach, the chainId was the name of the instances. Since some instances' names might clash, we needed to figure out all instance names in a given module first, and then determine what their final instance name was. If any clashed, we'd get the DuplicatePartialInstanceName error. If we didn't need to determine all initial instance names in a module, then the code for generating a unique identifier could be done in the desugarTypeClasses pass, right?

Once this PR changed the chainId to use the "filename and source positions" unique identifier approach proposed by @rhendric, the above requirement got dropped. I don't think anyone realized that at the time.

While the code might be more readable by keeping these two passes separate, should the instance name generation be merged into the desugarTypeClasses pass? Would that help compiler performance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make type class instance names optional by generating their names instead
5 participants