Make instance names optional by generating them based on types #4085

JordanMartinez · 2021-05-06T19:59:49Z

Description of the change

The biggest question here is how the name generator should work.

uniqueness - My current "just print the types" approach likely works for 80% of all cases, but any complicated ones using lots of parameters and nesting might clash with another instance that just so happens to produce the same name. I'm wondering whether using a generated name (e.g. GenIdent) would be a simpler and better approach
brevity - by accounting for all types, the name will be unique. However, it might be very long as a result. Consider what should be generated when nested records are used.
FFI-friendly? - I'm not sure if there's a concern that instances be callable from FFI. If there is, then I assume that one could workaround this problem by defining a name for an instance by hand (i.e. what we currently do for al instances).

The next question is how this code should be tested. I could take all current instance tests and add a variation of each file's code to see if things still work/fail when the name :: part is removed from the instance. Since the code would otherwise be the same AST, I'm not sure that's necessary. A better approach would be testing whether names are unique and short enough.

Checklist:

Added the change to the changelog's "Unreleased" section with a reference to this PR (e.g. "- Made a change (#0000)")
Linked any existing issues or proposals that this pull request should close
Updated or added relevant documentation
Added a test for the contribution (if applicable)

rhendric · 2021-05-06T20:24:01Z

I'd be in favor of opaque generated names ($1). If you omit the instance name, you shouldn't rely on what the compiler chooses for an instance name. So the compiler shouldn't export a name that you might think you could rely on. All that and it would be simpler.

hdgarrood · 2021-05-07T03:22:52Z

I agree that we should make it hard to end up relying on what the compiler chooses for an instance name, but having the scheme be just a $ followed by an incrementing integer would hurt the readability of the generated code quite a bit, and reading the generated code is useful from time to time. What if, as a compromise, we smush the type names together and then append a number? That way hopefully readability is still preserved, but the number at the end is still going to make it difficult to rely on.

hdgarrood · 2021-05-07T03:26:08Z

As for FFI-friendliness, I don't think that's a concern; we are gradually pushing users away from calling functions with constraints from JavaScript. For example, #3182 was a big step in this direction.

JordanMartinez · 2021-05-07T03:33:52Z

What if, as a compromise, we smush the type names together and then append a number? That way hopefully readability is still preserved, but the number at the end is still going to make it difficult to rely on.

Sounds good to me. Could you help guide me on how/where this number should be generated?

rhendric · 2021-05-07T03:34:03Z

How about className$1, as opposed to classNameArgArgArg$1? I'm sympathetic about readability but some of those smushed names could get so long as to hinder readability in a different way (think monadStateStateHalogenMStateActionSlotsOutputM$1).

Edit: or something like truncating at 25-ish characters would work for me too.

hdgarrood · 2021-05-07T03:38:13Z

I think I prefer truncating rather than only including the class name too. It's been too long since I've been in this code and so I don't really remember, but I'd recommend following the pattern used in other places where we generate names (e.g. for unknown types when type checking). I think using GenIdent together with MonadSupply for generating the integers should work.

hdgarrood · 2021-05-07T03:46:32Z

I think that if you generate the names in a desugaring step rather than in the CST->AST conversion, it might be simpler and it will match better with the existing patterns in the compiler. Of course it's not really ideal that doing it in a desugaring step means that the AST type will have to use a Maybe for the name as well, but we intend to get rid of explicit instance names eventually anyway (at least I think we do?) and so since we'd be able to get rid of that Maybe then, having to add the occasional fromMaybe (internalError ...) when accessing an instance name that we know exists doesn't seem too bad to me in the meantime.

JordanMartinez · 2021-05-07T03:53:04Z

I'll ask a dumb question. Why not ignore explicit names immediately and generate names using the truncation idea above? Is it because this would be a breaking change in case someone (who knows where) is using an instance in FFI? Or is there something else going on here I don't know of?

JordanMartinez · 2021-05-07T03:55:47Z

I think that if you generate the names in a desugaring step rather than in the CST->AST conversion, it might be simpler and it will match better with the existing patterns in the compiler.

Yeah, that's partly why I was originally asking "where" to generate the number. convertDeclaration runs in the List monad, so I don't have access to freshIdent as a side effect. Also, is it not possible to generate the number when the parser is running? Or does that incur other problems?

hdgarrood · 2021-05-07T04:10:19Z

Not a dumb question! I was thinking I’d like to continue using the declared instance names where they exist for as long as we are going to accept them, and that is indeed because of FFI - specifically because there may be cases where people are calling PureScript functions with constraints from the target language (eg JS). The compiler has no way of knowing when this is happening, so I think it will be worth being slightly more cautious. Also, I think it’s a little harder to justify going straight to ignoring the names with no advance notice because we haven’t (yet) explicitly come out and said anywhere that calling functions with constraints from outside PureScript is unsupported and going to become impossible as far as I know.

kl0tl · 2021-05-07T08:35:25Z

While working on this last summer I ended up with the following scheme for generating unique but still readable instance names, following the usual convention:

a leading dollar,
then the class name,
then two dollars,
then the instance arguments which are not determined by any functional dependency and are either nullary type constructors, application chains headed by a type constructor, or symbols, prefixed by $sym to distinguish them from type constructors, separated by two dollars.

Arguments in application chains are separated by a single dollar and are printed according to the scheme for instances arguments.

We don’t need to consider rows, variables nor determined arguments outside of instance chains because of overlaps, but we can avoid collisions easily enough by appending the instance position in the chain.

There’s a slight hurdle though: this scheme needs access to both the names written by the programmer (well I guess that’s not strictly required but fully qualified names do not help with readability), which are only available before renaming, and functional dependencies, which are only available after desugaring type classes 🙃

JordanMartinez · 2021-05-07T14:06:14Z

There’s a slight hurdle though: this scheme needs access to both the names written by the programmer (well I guess that’s not strictly required but fully qualified names do not help with readability), which are only available before renaming, and functional dependencies, which are only available after desugaring type classes

Does that mean there is preliminary work that needs to be done before this approach can be taken? Or that a different approach should be taken?

JordanMartinez · 2021-05-08T15:50:19Z

There’s a slight hurdle though: this scheme needs access to both the names written by the programmer (well I guess that’s not strictly required but fully qualified names do not help with readability), which are only available before renaming, and functional dependencies, which are only available after desugaring type classes 🙃

Sounds like this is better explained in #3426 (comment). In other words, it sounds like there is some preliminary work that needs to be done before multiple things can be unblocked.

rhendric · 2021-05-08T17:14:21Z

#4086 only blocks the smart name munging approach to this, not the quick-and-dirty truncated-string-plus-freshIdent approach. There are a couple of reasons to prefer the latter: it's likely simpler than what the implementation of #4086 would be, and I think it's more honest—this change does seem like a revision of the AST, not just the concrete surface syntax, and translating unnamed instances to uniquely-named instances does seem like a desugaring step.

JordanMartinez · 2021-05-10T03:25:21Z

Thanks for clarifying. I'll keep working on this.

One other question. Looking at the lines for converting instance chains from CST to AST, it seems that all instance names within an instance chain is stored as a list in each AST.TypeInstanceDeclaration value. For example:

instance foo :: ClassA Foo
else instance bar :: ClassA Bar
else instance baz :: ClassA Baz

will be stored as three separate AST.TypeInstanceDeclaration where the chainId value stores the entire list as ["foo", "bar", "baz"]:

AST.TypeInstanceDeclaration ann ["foo", "bar", "baz"] 0 "foo" deps className tys body
AST.TypeInstanceDeclaration ann ["foo", "bar", "baz"] 1 "bar" deps className tys body
AST.TypeInstanceDeclaration ann ["foo", "bar", "baz"] 2 "baz" deps className tys body

By removing the instance names, the source code would be something like this...

instance ClassA Foo
else instance ClassA Bar
else instance ClassA Baz

and the resulting AST.TypeInstanceDeclaration would be this:

-- foo
AST.TypeInstanceDeclaration ann [Nothing, Nothing, Nothing] 0 Nothing deps className tys body
-- bar
AST.TypeInstanceDeclaration ann [Nothing, Nothing, Nothing] 1 Nothing deps className tys body
-- baz
AST.TypeInstanceDeclaration ann [Nothing, Nothing, Nothing] 2 Nothing deps className tys body

Using what would be baz as an example, I believe I can generate an instance name for baz's instance since I still have the className and tys values available. Thus, the AST would be updated to be something like this:

-- baz
AST.TypeInstanceDeclaration ann [Nothing, Nothing, Just "$$ClassName$Baz$2"] 2 "$$ClassName$Baz$2" deps className tys body

But how would I update the other two Nothings (i.e. foo and bar's names)? While the className value should remain the same in an instance chain, I do not know what foo and bar's tys arguments are.

rhendric · 2021-05-10T04:05:01Z

That chain field always struck me as a little bit broken, actually. I don't think it has to be a list of names; it just has to be some kind of unique identifier. I don't know why it couldn't be a (ModuleName, SourcePos), or anything else that's convenient to make unique.

JordanMartinez · 2021-05-10T04:10:54Z

How is chainId even used? As a key in some Map?

rhendric · 2021-05-10T04:14:01Z

I know it's used in Language/PureScript/TypeChecker.hs as part of checkOverlappingInstance. I think but am not 100% confident that this is its one and only purpose.

JordanMartinez · 2021-05-10T04:31:33Z

Hm... Maybe I should spend more time reading through this repo's source code and just getting a deeper understanding of things before I continue contributing. Perhaps documenting things along the way might help, too.

JordanMartinez · 2021-05-11T00:34:40Z

I've run out of time and will have to come back to this PR in a few weeks. This PR was to see how far I could get in a week or so because I got bored of fixing v0.14.1 PS warnings on contrib libraries. When I return to this, I'd like to read through more of the codebase and continue from there.

JordanMartinez · 2021-05-13T01:47:14Z

Took a quick break from other work to see where chainId is used. Here's where it's used:

The chain is converted to a chain with qualified names, then in checkOverlappingInstances the list is used only once to prevent an OverlappingInstance error from being thrown. Also, the tcdChain function comes from the TypeClassDictionary type

Essentially, if we can accomplish that equality check with something functionally equivalent, then it unblocks the next part of this PR.

JordanMartinez · 2021-05-16T21:17:49Z

Latest update on this PR.

After getting a better understanding of the general flow of this code, I took the following approach:

I changed the name arg of AST's TypeInstanceDeclaration from Ident to Either Text Ident. Left t corresponds to the partially generated (i.e. no unique identifier added) name of the instance as computed in convertDecl. Right i corresponds to the unique name defined in source code. I also changed it's chainId from [Ident] to [Either Text Ident]. It ultimately works the same way it does before this PR is introduced (further described below)
I added another pass to the desugar code that will complete the generated instance names via fresh. During this pass, I also return the mapping (i.e. instMap) from the partially-completed instance name to the fully generated instance name.
In the desugarTypeClasses pass, all Left t are remapped to their final names via the mapping created in the previous pass. The final and correct instance names with fully generated values are then passed into the rest of the compiler's pipeline.

A few questions I have:

should all types in convertDecl be mapped to a text value? Right now I only have some of them based on the typeAtom expression, and I'm curious if TypeParens might reveal one of the other ones not currently covered.
will the instance name remapping approach I've taken with instMap ever produce an error because two instances in a chain produce the same partially-completed instance name before the unique identifier is appended to it?
how should this PR be tested?

src/Language/PureScript/Sugar/TypeClasses/Instances.hs

JordanMartinez · 2021-05-16T21:31:26Z

Regardless, this PR is ready for feedback.

GenIdent causes an error in properNameToJSON

JordanMartinez · 2021-05-21T01:37:54Z

Latest commit changes this...

$dollarModule_ClassName$dollarArray$dollarString$dollarBo$dollar42

to this

$dollarModule_ClassName_Array_String_Bo$dollar42

I didn't change the left-most or right-most dollars as it's not yet clear from other core members whether that should be done. Changing the middle $ chars to _ seemed reasonable to make without additional feedback.

kl0tl

I left some comments on the instances names generation, nothing too important though. Also, do we even need to have separators since we don’t rely on them to disambiguate similar instances (for example both C (Foo Bar) Baz and C Foo (Bar Baz) yield $C_Foo_Bar_Baz, to which we then append an unique suffix)? Is $dollarModule_ClassName_Array_S$dollar42 really more legible than $dollarModuleClassNameArrayStr$dollar42?

CHANGELOG.md

lib/purescript-cst/src/Language/PureScript/CST/Convert.hs

kl0tl · 2021-05-21T08:24:21Z

lib/purescript-cst/src/Language/PureScript/CST/Convert.hs

+          argName t1 <> "$" <> (N.runOpName $ qualName op) <> "$" <> argName t2
+        TypeArr _ t1 _ t2 -> argName t1 <> "_Arrow_" <> argName t2
+        TypeConstrained{} -> ""
+        TypeUnaryRow{} -> "EmptyRow"


Row Type is rendered as Row_Type, so I think # Type should render to the same string:

Suggested change

TypeUnaryRow{} -> "EmptyRow"

TypeUnaryRow{} -> "Row"

hdgarrood · 2021-05-21T12:13:02Z

@rhendric for what it's worth, I definitely consider you to be a real maintainer. If you don't feel that way, I think that's more of an indication that we haven't quite sorted out our governance model just yet than anything else.

JordanMartinez · 2021-05-21T13:14:58Z

Is $dollarModule_ClassName_Array_S$dollar42 really more legible than $dollarModuleClassNameArrayStr$dollar42?

A few thoughts on this:

Following current naming convention, we don't separate names with '_' or '$'. This was influenced by kl0tl's original approach in a related work. So, I agree that separators should be dropped altogether.
Following current naming convention, we don't add a module name to the instance name. Adding the module name was necessary when I was using my original approach for the chain id issue. Having switched to rhendric's filename and source pos idea, it's no longer necessary. So, the module should be dropped from the instance name.
Since readability is a concern, I think adding a '_' character after the first '$dollar' would make the generated instance name easier to read (e.g. $dollarClassNameArrayS$dollar42 vs $dollar_ClassNameArrayS$dollar42).
I feel like the initial $dollar is good and necessary to prevent name clashing, but that the final $dollar separating the name from the unique identifier is not necessary. For readability concerns, perhaps adding two '_' characters would be better in case one defines a type that ends with an underscore?

JordanMartinez · 2021-05-22T17:05:10Z

Any other changes on this? Or am I good to squash and merge?

JordanMartinez · 2021-05-29T17:43:12Z

I was thinking about this PR just a few minutes ago and I realized that we can likely merge the desugarTypeClassInstanceNames pass into the desugarTypeClasses pass. These two passes originally needed to be separate passes because of how my original approach worked.

In my original approach, the chainId was the name of the instances. Since some instances' names might clash, we needed to figure out all instance names in a given module first, and then determine what their final instance name was. If any clashed, we'd get the DuplicatePartialInstanceName error. If we didn't need to determine all initial instance names in a module, then the code for generating a unique identifier could be done in the desugarTypeClasses pass, right?

Once this PR changed the chainId to use the "filename and source positions" unique identifier approach proposed by @rhendric, the above requirement got dropped. I don't think anyone realized that at the time.

While the code might be more readable by keeping these two passes separate, should the instance name generation be merged into the desugarTypeClasses pass? Would that help compiler performance?

…ass (#4099) This refactors the work done in #4085. It was not merged immediately due to the work done in #4096 that would produce a merge conflict.

Make instance names optional by generating them based on types

c833b8e

JordanMartinez added 3 commits May 16, 2021 14:19

Re-inline instanceAnn so less code is changed in PR

01ecd20

Rename function to better indicate that name is not yet complete

45b44e6

Remove final '$' char as its added in desugar process

fd4cec3

JordanMartinez commented May 16, 2021

View reviewed changes

src/Language/PureScript/Sugar/TypeClasses/Instances.hs Outdated Show resolved Hide resolved

JordanMartinez added 2 commits May 16, 2021 16:50

Refactor with unzip

c8a2652

Use Ident rather than GenIdent

fb5638d

GenIdent causes an error in properNameToJSON

Replace all middle usages separater char "$" with "_" in instance name

7c0e35b

kl0tl reviewed May 21, 2021

View reviewed changes

JordanMartinez added 7 commits May 21, 2021 06:21

Change EmptyRow to Row

e0eecda

No longer include module name in instance name

5b07ac6

Add '_' after initial '$'; remove all other middle separators

ae3915f

Update changelog to show final instance name

cbb0f12

Remove redundant import (foldMap)

306f37b

Replace last '$' with '_'

8ea64f9

Update changelog

a461025

kl0tl approved these changes May 24, 2021

View reviewed changes

JordanMartinez added 3 commits May 24, 2021 06:56

Remove naming convention comment above unnamed instance in changelog

ec37c11

Better clarify instance naming convention in named instance in changelog

f784a4e

Add note about 25-char instance name truncation in changelog

f90bd6d

JordanMartinez merged commit 2daf461 into purescript:master May 24, 2021

JordanMartinez deleted the optionalInstanceNames branch May 24, 2021 14:15

This was referenced May 24, 2021

Follow more HLint suggestions #4090

Merged

Make type class instance names optional by generating their names instead #4084

Closed

This was referenced May 27, 2021

Improve apartness checking #4064

Merged

Use GenIdent for anonymous instances #4096

Merged

This was referenced Jun 2, 2021

"Why must I manually name my instances?" section should be removed milesfrain/purescript-faq#12

Closed

Much shorter instancing/generics-deriving syntax #3426

Open

JordanMartinez mentioned this pull request Jun 2, 2021

Desugar type class instance names in type class desugaring pass #4099

Merged

5 tasks

Neppord mentioned this pull request Sep 26, 2021

Cannot support optional names for instances of type classes intellij-purescript/intellij-purescript#280

Closed

JordanMartinez mentioned this pull request Jan 18, 2022

Update version to v0.15.0 working-group-purescript-es/purescript#4

Closed

Make instance names optional by generating them based on types #4085

Make instance names optional by generating them based on types #4085

Uh oh!

Conversation

JordanMartinez commented May 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhendric commented May 6, 2021

Uh oh!

hdgarrood commented May 7, 2021

Uh oh!

hdgarrood commented May 7, 2021

Uh oh!

JordanMartinez commented May 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhendric commented May 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hdgarrood commented May 7, 2021

Uh oh!

hdgarrood commented May 7, 2021

Uh oh!

JordanMartinez commented May 7, 2021

Uh oh!

JordanMartinez commented May 7, 2021

Uh oh!

hdgarrood commented May 7, 2021

Uh oh!

kl0tl commented May 7, 2021

Uh oh!

JordanMartinez commented May 7, 2021

Uh oh!

JordanMartinez commented May 8, 2021

Uh oh!

rhendric commented May 8, 2021

Uh oh!

JordanMartinez commented May 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhendric commented May 10, 2021

Uh oh!

JordanMartinez commented May 10, 2021

Uh oh!

rhendric commented May 10, 2021

Uh oh!

JordanMartinez commented May 10, 2021

Uh oh!

JordanMartinez commented May 11, 2021

Uh oh!

JordanMartinez commented May 13, 2021

Uh oh!

JordanMartinez commented May 16, 2021

Uh oh!

Uh oh!

JordanMartinez commented May 16, 2021

Uh oh!

JordanMartinez commented May 21, 2021

Uh oh!

kl0tl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kl0tl May 21, 2021

Choose a reason for hiding this comment

Uh oh!

hdgarrood commented May 21, 2021

Uh oh!

JordanMartinez commented May 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JordanMartinez commented May 22, 2021

Uh oh!

JordanMartinez commented May 29, 2021

Uh oh!

Uh oh!

JordanMartinez commented May 6, 2021 •

edited

Loading

JordanMartinez commented May 7, 2021 •

edited

Loading

rhendric commented May 7, 2021 •

edited

Loading

JordanMartinez commented May 10, 2021 •

edited

Loading

JordanMartinez commented May 21, 2021 •

edited

Loading