Parallelize the Analyzer #4897

gzm0 · 2023-08-19T17:31:04Z

Leads to almost 2x speedup of "Compute Reachability".

> d %>% group_by(op, variant) %>% summarize(median(t_ns))
`summarise()` has grouped output by 'op'. You can override using the
`.groups` argument.
# A tibble: 6 × 3
# Groups:   op [2]
  op                            variant  `median(t_ns)`
  <fct>                         <fct>             <dbl>
1 Linker: Compute reachability  main         323617749 
2 Linker: Compute reachability  parallel     163542533 
3 Linker: Compute reachability  pr           165204962.
4 Refiner: Compute reachability main         255269676 
5 Refiner: Compute reachability parallel     170701844 
6 Refiner: Compute reachability pr           169508056.

gzm0 · 2023-08-19T17:31:37Z

main: baseline
parallel: the original version without the last commit in this PR
pr: this PR.

gzm0 · 2023-08-19T17:32:45Z

linker/shared/src/main/scala/org/scalajs/linker/analyzer/Analyzer.scala

-    _classInfos.get(className) match {
-      case None =>
-        val loading = new LoadingClass(className)
+  private final class ClassLoader(implicit ec: ExecutionContext) {


I originally isolated this into its own class to make _classInfos more isolated. Now I'm not so sure anymore it's really an improvement. WDYT?

I'm not sure either. It doesn't seem substantially better or worse either way. What I do find a bit problematic is that this change combined in the same commit that makes it thread-safe makes things quite a bit blurry to me. It's unclear what has actually changed in terms of thread-safety. Is it just _classInfos becoming a TrieMap? If yes, can we put that in a separate commit?

Is it just _classInfos becoming a TrieMap?

Ish, yes. But doing this also implies moving the actual loading code from LodingClass to its own method (doLoad) inside the ClassLoader. Essentially, we use the _classInfos as a synchronization point to chose the thread which is supposed to trigger class loading. Before this commit, that was part of the constructor of LoadingClass.

If yes, can we put that in a separate commit?

Not trivially, but I can attempt to split it into smaller refactoring steps if you think it would be helpful.

I have attempted to suggest an ordering in this comment, but it is non-trivial since with the thread-safety, we can also leverage futures more, so this makes other code paths (notably the new different code paths for lookupClassForLinking and lookupClass) manageable (whereas w/o futures, they'd need additional helpers).

OK, I see. Let's keep it as is, then. It's not that hard to make sense of.

gzm0 · 2023-08-19T17:35:42Z

Raw benchmark data

logger-timings.csv

gzm0 · 2023-08-20T07:59:05Z

linker/shared/src/main/scala/org/scalajs/linker/analyzer/Analyzer.scala

-    var methodsCalledLog: List[MethodName] = Nil
+    private val _instantiatedSubclasses = new AtomicReference[List[ClassInfo]](Nil)
+
+    private val methodsCalledLog = new AtomicReference[List[MethodName]](Nil)


Note: This is always strictly methodsCalledLog.toSet == dispatchCalledFrom.keySet. However, we need to be able to snapshot this in subclassInstantiated. TrieMap supports snapshotting, but mutable.Map doesn't.

So the options are:

Keep this.

Switch to AtomicReference[immutable.Map[MethodName, AtomicReference[List[From]]] (I have concerns about key contention on the map here).

Do some type hackery so we can get "snapshots" also on normal mutable maps in JS.

IMO the current option is best, but a comment wouldn't be superfluous.

gzm0 · 2023-08-20T08:02:58Z

linker/shared/src/main/scala/org/scalajs/linker/analyzer/Analyzer.scala

-    def join(): Future[Unit] = {
-      tryDoWork()
-      promise.future
+      fut.map(onSuccess).onComplete {


I'm wondering if I should add a commit removing the continuation passing style. IIUC we wouldn't make any more futures, the code would just be more idiomatic.

Perhaps, but it can probably equally be done as a follow-up PR, no?

gzm0 · 2023-08-20T08:46:42Z

I have ran benchmarking with IR checking (or more notably ClassChecking) disabled as well:

logger-timings.csv

I cannot explain why main-nocheck appears slower than main in the linker pass.
It is expected that disabling class checking in the first linker pass doesn't change anything in the incremental setting: Since the underlying IR files do not change, we simply bypass it (here)
It is expected that disabling checking in the refiner pass improves performance: Since we cannot version the full class anymore at this point (it is "stitched together"), we have to re-run ClassDef checking even in incremental passes.

gzm0 · 2023-08-20T12:25:26Z

linker/shared/src/main/scala/org/scalajs/linker/analyzer/Analyzer.scala

-          (isScalaClass || isJSClass || isNativeJSClass)) {
-        isInstantiated = true
+      if ((isScalaClass || isJSClass || isNativeJSClass) &&
+          !_isInstantiated.getAndSet(true)) {


Scalastyle fails on this line, claiming "Boolean expression can be simplified.

Relevant code:
https://github.com/scalastyle/scalastyle/blob/v1.0.0/src/main/scala/org/scalastyle/scalariform/SimplifyBooleanExpressionChecker.scala

I do not understand what is happening. From how I read the relevant code, it should trigger on the following operators being used with literal boolean values: ! != == && ||.

It does seem nonsensical. But perhaps that's opportunity to make the side-effect a bit more separated from the rest of the condition:

if (!(isScalaClass || isJSClass || isNativeJSClass)) { // Ignore // TODO, shouldn't this be a linking error instead? } else if (_!isInstantiated.getAndSet(true)) {

Note: Switching to infix notation for getAndSet or method notation for && makes the error go away. But maybe we should just deactivate the check?

I'd rather deactivate the check than arbitrarily switching notation.

sjrd

Looks good overall. Great job. I have some minor comments here and there.

linker/shared/src/main/scala/org/scalajs/linker/analyzer/Analyzer.scala

sjrd · 2023-08-20T14:45:10Z

linker/shared/src/main/scala/org/scalajs/linker/analyzer/Analyzer.scala

-          (isScalaClass || isJSClass || isNativeJSClass)) {
-        isInstantiated = true
+      if ((isScalaClass || isJSClass || isNativeJSClass) &&
+          !_isInstantiated.getAndSet(true)) {


It does seem nonsensical. But perhaps that's opportunity to make the side-effect a bit more separated from the rest of the condition:

if (!(isScalaClass || isJSClass || isNativeJSClass)) { // Ignore // TODO, shouldn't this be a linking error instead? } else if (_!isInstantiated.getAndSet(true)) {

linker/shared/src/main/scala/org/scalajs/linker/analyzer/Analyzer.scala

sjrd · 2023-08-20T14:48:10Z

linker/shared/src/main/scala/org/scalajs/linker/analyzer/Analyzer.scala

-    var methodsCalledLog: List[MethodName] = Nil
+    private val _instantiatedSubclasses = new AtomicReference[List[ClassInfo]](Nil)
+
+    private val methodsCalledLog = new AtomicReference[List[MethodName]](Nil)


IMO the current option is best, but a comment wouldn't be superfluous.

linker/shared/src/main/scala/org/scalajs/linker/analyzer/Analyzer.scala

sjrd · 2023-08-20T14:56:07Z

linker/shared/src/main/scala/org/scalajs/linker/analyzer/Analyzer.scala

-    def join(): Future[Unit] = {
-      tryDoWork()
-      promise.future
+      fut.map(onSuccess).onComplete {


Perhaps, but it can probably equally be done as a follow-up PR, no?

sjrd · 2023-08-20T14:59:18Z

linker/shared/src/main/scala/org/scalajs/linker/analyzer/Analyzer.scala

-          assert(queue.isEmpty)
+    private def taskDone(): Unit = {
+      if (pending.decrementAndGet() == 0) {
+        if (pending.compareAndSet(0, -1))


The second one can be a direct set(-1), can't it? There's only one thread that will receive 0 as result of the decrementAndGet().

Yes, indeed. It actually points to a weakness in our current design (which I have documented as a comment). IMO we should still proceed, but I should try and work towards removing the work tracker entirely.

sjrd · 2023-08-23T15:15:14Z

One more thing I just thought about: is there any way we can honor linkerConfig.parallel here? In other words, if parallel is false, use a single thread for running all the (non-IO?) tasks? It may help diagnosing issues in the future if anything comes up.

He-Pin · 2023-08-25T03:05:14Z

linker/jvm/src/main/scala/org/scalajs/linker/analyzer/Platform.scala

+import scala.collection.concurrent.TrieMap
+
+private[analyzer] object Platform {
+  def emptyThreadSafeMap[K, V]: mutable.Map[K, V] = TrieMap.empty


IIRC, the TrieMap will not compact the removed entry, so the mememory usage will be lager than the ConcurrentHashMap, but the ConcurrentHashMap have a different trait.

We don't remove anything from these maps, so that doesn't seem like a relevant concern.

FWIW: TrieMap supports lock-free, "opportunistic" getOrElseUpdate (might create the value multiple times, but won't if it's already in the map). ConcurrentHashMap doesn't (computeIfAbsent). That was the primary motivator to use TrieMap.

I always use a get to check before invoking the computeIfAbsnet, and the bug you mentioned about ConcurrentHashMap has been fixed in JDK 9.

I'm not sure what you mean with "the bug I mention". The JavaDoc is very clear that computeIfAbsent can be blocking:

The entire method invocation is performed atomically, so the function is applied at most once per key. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map.

While this can be desirable sometimes, it's not what we want here: We do not need the guarantee that the method invocation is performed atomically, but we do want the operation to remain lock-free.

op variant `median(t_ns)` <fct> <fct> <dbl> 1 Linker: Compute reachability main 323617749 2 Linker: Compute reachability parallel 163542533 3 Refiner: Compute reachability main 255269676 4 Refiner: Compute reachability parallel 170701844

Now that we can run in parallel, we can use full future power.

gzm0 · 2023-08-26T07:56:14Z

One more thing I just thought about: is there any way we can honor linkerConfig.parallel here? In other words, if parallel is false, use a single thread for running all the (non-IO?) tasks? It may help diagnosing issues in the future if anything comes up.

That is an excellent point. But I'm not sure we can do so trivially.

Options I see:

Keep the old code (with workQueue) around.
- Might block us from further improvements in the Analyzer in the future (e.g. rely on futures more).
Build our own executor that has its own queue and uses the calling thread (before waiting for workTracker.future) to process things.

This might be tricky to get right without deadlocks (and is likely quite a bit of code).
Ignore the passed execution context and create our own one (with https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executors.html#newSingleThreadScheduledExecutor--).

Simplest solution, but spawns a new, unnecessary thread (the again, with parallel = false performance isn't the goal). Needs a bit of cross-platform wiring, but likely not too bad.
Do not honor linkerConfig.parallel.

Status quo. OK as long as things work, but if they don't having the debug option would indeed be nice.

WDYT? I'm leaning towards option 3. Note that with 2/3, we have the choice of running I/O on that single executor thread as well. Which we probably should for simplicity?

sjrd · 2023-08-26T08:35:47Z

Option 3 with IO on the same unique thread seems like the most reliable and simple option, so I'd be in favor of that, yes.

gzm0 · 2023-08-26T13:33:57Z

I have added an additional commit honoring the parallel config. I believe I have addressed all the comments at this point.

gzm0 · 2023-08-27T07:40:22Z

I have tested manually that helloworld2_12/fastLinkJS and testSuite2_12/Test/fastLinkJS complete with parallel = false.

sjrd

Excellent! :)

gzm0 requested a review from sjrd August 19, 2023 17:31

gzm0 commented Aug 19, 2023

View reviewed changes

gzm0 force-pushed the parallel-linker branch from c420e09 to 1f37da9 Compare August 20, 2023 07:42

gzm0 commented Aug 20, 2023

View reviewed changes

gzm0 force-pushed the parallel-linker branch from 1f37da9 to bd9066e Compare August 20, 2023 08:00

gzm0 commented Aug 20, 2023

View reviewed changes

Make class loading thread safe

0f3ba71

gzm0 commented Aug 20, 2023

View reviewed changes

sjrd requested changes Aug 20, 2023

View reviewed changes

gzm0 added 8 commits August 23, 2023 06:30

Make error collection thread-safe

530a704

Make module dependency collection thread-safe

c857067

Make topLevelExportInfo collection thread-safe

6e9a0c9

Make MethodInfo management thread-safe

9450a37

Make from collection thread-safe

eb8f45b

Make jsNativeMember collection thread-safe

d7d915a

Make field collection thread-safe

4d8ffce

Make reachability flags thread-safe

f7d6db5

gzm0 force-pushed the parallel-linker branch from bd9066e to 4621cbb Compare August 23, 2023 04:40

He-Pin reviewed Aug 25, 2023

View reviewed changes

Make method dispatch thread-safe

355ca93

gzm0 force-pushed the parallel-linker branch from 4621cbb to 7494019 Compare August 26, 2023 07:27

gzm0 added 2 commits August 26, 2023 09:37

Parallel analyzer

0da7a54

op variant `median(t_ns)` <fct> <fct> <dbl> 1 Linker: Compute reachability main 323617749 2 Linker: Compute reachability parallel 163542533 3 Refiner: Compute reachability main 255269676 4 Refiner: Compute reachability parallel 170701844

Rewrite computeMostSpecificProxyMatch

a28fa78

Now that we can run in parallel, we can use full future power.

gzm0 force-pushed the parallel-linker branch from 7494019 to a28fa78 Compare August 26, 2023 07:38

gzm0 requested a review from sjrd August 26, 2023 13:34

Honor the parallel flag in the Analyzer

9a0143c

gzm0 force-pushed the parallel-linker branch from 7dd2e3d to 9a0143c Compare August 27, 2023 07:36

sjrd approved these changes Aug 28, 2023

View reviewed changes

sjrd merged commit 19d3ce0 into scala-js:main Aug 28, 2023

gzm0 deleted the parallel-linker branch August 30, 2023 04:20

gzm0 mentioned this pull request Sep 24, 2023

Announcing Scala.js 1.14.0. scala-js/scala-js-website#617

Merged

Parallelize the Analyzer #4897

Parallelize the Analyzer #4897

Uh oh!

Conversation

gzm0 commented Aug 19, 2023

Uh oh!

gzm0 commented Aug 19, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gzm0 commented Aug 19, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gzm0 commented Aug 20, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sjrd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sjrd commented Aug 23, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

He-Pin Aug 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gzm0 commented Aug 26, 2023

Uh oh!

sjrd commented Aug 26, 2023

Uh oh!

gzm0 commented Aug 26, 2023

Uh oh!

gzm0 commented Aug 27, 2023

Uh oh!

sjrd left a comment

Choose a reason for hiding this comment

Uh oh!

He-Pin Aug 26, 2023 •

edited

Loading