Skip to content

Faster warnDuplicateRefs #3899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 26, 2020

Conversation

milesfrain
Copy link
Contributor

See the "Reducing incremental rebuild time with large file dependency" thread for context.

Looks like the issue is here:

dupeRefs = mapMaybe (refToName pos) $ withoutCtors \\ nub withoutCtors

Profiling shows warnDuplicateRefs takes 46% of the build time - the majority of it spent making 96 million calls to ==.

I replaced the slow \\ nub pattern with a faster onlyDuplicates function. I'm surprised this doesn't exist already in some library.

The onlyDuplicates function also seems more correct than \\ nub in that it doesn't arbitrarily discard one of every duplicate. For example:

xs = [1,2,2,3,3,3,4]
xs \\ nub xs -- [2,3,3]
onlyDuplicates xs -- [3,3,3,2,2]

For my use case with a large Tailwind.purs file, overall compilation times are reduced from 3.1 to 1.2 seconds and the share of time spent in warnDuplicateRefs is reduced from 46% to 0.5%.

Here are more detailed performance notes:

Original

45% of time in construct
  20% in runDecoder
    9 Million entries
49% in make.\
  46% in warnDuplicateRefs
    40% of time running ==
      96 Million entries
$ echo " " >> src/Example.purs
$ time spago build            
[info] Installation complete.
Compiling Example
Compiling Main
[info] Build succeeded.
spago build  4.23s user 0.98s system 168% cpu 3.091 total
$ time spago build
[info] Installation complete.
[info] Build succeeded.
spago build  1.66s user 0.42s system 288% cpu 0.720 total

This PR

83% of time in construct
  37% in runDecoder
    9 Million entries
6% in make.\
  0.5% in warnDuplicateRefs
$ echo " " >> src/Example.purs
$ time spago build
[info] Installation complete.
Compiling Example
Compiling Main
[info] Build succeeded.
spago build  2.22s user 1.24s system 279% cpu 1.239 total
$ time spago build
[info] Installation complete.
[info] Build succeeded.
spago build  1.72s user 0.44s system 285% cpu 0.759 total

@milesfrain
Copy link
Contributor Author

There are a few other \\ nub patterns that can be improved. Find these with:

rg -t hs -i '\\\\.*nub'

Since these don't preserve all duplicates, I'm wondering if this would also fix #3884

@milesfrain
Copy link
Contributor Author

Is compDecRef unnecessary now, or does it require this more relaxed (and possibly faster) ordering scheme?

Also, it seems like the code for the Ord instances of DeclarationRef and (Type a) could be condensed by following compDecRef's orderOf strategy.

-- enable sorting lists of explicitly imported refs when suggesting imports in linting, IDE, etc.
-- not an Ord because this implementation is not consistent with its Eq instance.
-- think of it as a notion of contextual, not inherent, ordering.
compDecRef :: DeclarationRef -> DeclarationRef -> Ordering
compDecRef (TypeRef _ name _) (TypeRef _ name' _) = compare name name'
compDecRef (TypeOpRef _ name) (TypeOpRef _ name') = compare name name'
compDecRef (ValueRef _ ident) (ValueRef _ ident') = compare ident ident'
compDecRef (ValueOpRef _ name) (ValueOpRef _ name') = compare name name'
compDecRef (TypeClassRef _ name) (TypeClassRef _ name') = compare name name'
compDecRef (TypeInstanceRef _ ident) (TypeInstanceRef _ ident') = compare ident ident'
compDecRef (ModuleRef _ name) (ModuleRef _ name') = compare name name'
compDecRef (KindRef _ name) (KindRef _ name') = compare name name'
compDecRef (ReExportRef _ name _) (ReExportRef _ name' _) = compare name name'
compDecRef ref ref' = compare
(orderOf ref) (orderOf ref')
where
orderOf :: DeclarationRef -> Int
orderOf TypeClassRef{} = 0
orderOf TypeOpRef{} = 1
orderOf TypeRef{} = 2
orderOf ValueRef{} = 3
orderOf ValueOpRef{} = 4
orderOf KindRef{} = 5
orderOf _ = 6

@milesfrain milesfrain requested a review from kritzcreek June 19, 2020 18:10
Copy link
Member

@kritzcreek kritzcreek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks great. I noticed this is pointed at the 0.13.x release branch. Shouldn't we be commiting to master first and then cherry picking the changes we'd like to release onto the release branches, @hdgarrood ?

removeUnique :: Eq a => Ord a => [a] -> [a]
removeUnique = concat . filter twoOrMore . group . sort
where
twoOrMore :: [a] -> Bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this can just be (\x -> length x > 1), but I'll leave that to your judgement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going with the "guaranteed fast" over the "likely fast" strategy, even if it's a few more lines of code.

@hdgarrood
Copy link
Contributor

hdgarrood commented Jun 24, 2020

Shouldn't we be commiting to master first and then cherry picking the changes we'd like to release onto the release branches, @hdgarrood ?

Yes please - there may not be another 0.13.x release, but even if we do make an 0.13.9 release we will want any changes in master first.

@milesfrain I like the orderOf strategy, and I think you're right that compDecRef is now unnecessary, now that the Ord instance doesn't look at the source span. I think the only differences between this Ord instance and compDecRef is that this Ord instance looks at the data constructors in a TypeRef, and that the constructor order is different (eg compDecRef puts type classes first). I think the difference in comparing data constructors is not an issue, because compDecRef is only used in two places: once in the linter when suggesting explicit imports, and once in the IDE when amending imports, but in either case I think we shouldn't ever end up with more than one TypeRef for a given type in the list of refs being sorted, so I think removing compDecRef and using this Ord instance should be okay. The order shouldn't be an issue either, because we didn't have an Ord instance before, and we don't care about the precise order in this particular part of the compiler, just that we have an Ord instance.

If you'd like to rewrite the Ord DeclarationRef instance with the orderOf strategy, I'd appreciate that. If you're going to go down that route, could you please:

  • make the orderOf function use an exhaustive pattern match?
  • add a comment along the lines of the existing compDecRef one, to point out that this particular ordering is used in the linter and the IDE's handling of imports, so it's on purpose that the order of constructors in the ordering is different to the order of constructors in the source file?

If you'd like to do Type too, that would also be nice, but could you please leave Type out of this PR to keep it small?

@hdgarrood
Copy link
Contributor

I edited the above comment a few times, sorry. I'm done now :)

@milesfrain milesfrain force-pushed the faster-warnDuplicateRefs branch from d478394 to 32ad517 Compare June 24, 2020 21:29
@milesfrain milesfrain changed the base branch from 0.13.x to master June 24, 2020 21:29
@milesfrain
Copy link
Contributor Author

milesfrain commented Jun 24, 2020

Rebased to master.

I'm finding that stack build now fails on master with or without this change. Maybe a clean is required. Edit: Fixed with stack clean.

I'll make a separate PR to clean-up ordering code that includes:

  • Replace compDecRef with instance Ord DeclarationRef
    • Regarding:

      add a comment along the lines of the existing compDecRef one, to point out that this particular ordering is used in the linter and the IDE's handling of imports, so it's on purpose that the order of constructors in the ordering is different to the order of constructors in the source file

      Would it be better to reorder the constructors to match this linter ordering?

  • Use the orderOf strategy for Type

@hdgarrood
Copy link
Contributor

Looks like CI is failing because a compiler warning has changed. Could you run the tests with —accept to see if that fixes it?

@milesfrain
Copy link
Contributor Author

milesfrain commented Jun 25, 2020

Could you run the tests with —accept to see if that fixes it?

What command is that? stack test --accept?

@milesfrain milesfrain closed this Jun 25, 2020
@milesfrain milesfrain reopened this Jun 25, 2020
@hdgarrood
Copy link
Contributor

stack test —test-arguments —accept. Without the em-dashes, though (I’m on my phone).

@milesfrain
Copy link
Contributor Author

Since preserving all duplicates is not the core focus of this PR, I just removed that change and left a note about it to be tackled in another issue.

Some tests still needed to be modified to account for minor reordering of warnings though. I assume that's because the list is actually sorted with this change.

@hdgarrood
Copy link
Contributor

Yeah, I think having the same warnings in a different order is fine. Thanks very much!

Also, from earlier,

Would it be better to reorder the constructors to match this linter ordering?

Yes, that sounds like a good idea to me.

@hdgarrood hdgarrood merged commit d5ce655 into purescript:master Jun 26, 2020
@milesfrain milesfrain deleted the faster-warnDuplicateRefs branch July 26, 2021 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants