Skip to content

Add GraphsSharedArraysExt and remove Distributed dependency #430

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

topolarity
Copy link

This PR follows up on #429 to enable parallel=:threads by default and move Distributed-based parallelism to an extension, for faster loading times and reduced dependencies.

On Julia <1.9, the dependency is still there so this shouldn't break backwards compatibility (1.6 tests are passing for me locally)

These implementations are extremely basic, but they try to follow the
patterns in the other parts of the Parallel module. My real motivation
is to be able to move the Distributed implementations into an extension,
so that Graphs.jl does not depend on Distributed.
@topolarity topolarity force-pushed the ct/distributed-ext branch from b44e9a0 to 3476c5f Compare May 15, 2025 02:10
Copy link

codecov bot commented May 15, 2025

Codecov Report

Attention: Patch coverage is 86.42857% with 19 lines in your changes missing coverage. Please review.

Project coverage is 96.63%. Comparing base (6130332) to head (3476c5f).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/Parallel/distance.jl 81.25% 3 Missing ⚠️
src/Parallel/shortestpaths/dijkstra.jl 75.00% 3 Missing ⚠️
src/Parallel/traversals/greedy_color.jl 78.57% 3 Missing ⚠️
src/Parallel/centrality/betweenness.jl 0.00% 2 Missing ⚠️
src/Parallel/centrality/closeness.jl 0.00% 2 Missing ⚠️
src/Parallel/centrality/radiality.jl 33.33% 2 Missing ⚠️
src/Parallel/centrality/stress.jl 33.33% 2 Missing ⚠️
src/Parallel/utils.jl 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #430      +/-   ##
==========================================
- Coverage   97.31%   96.63%   -0.68%     
==========================================
  Files         117      118       +1     
  Lines        6956     7011      +55     
==========================================
+ Hits         6769     6775       +6     
- Misses        187      236      +49     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@topolarity
Copy link
Author

Expression: result ⊜ true
Evaluated: ⟪result: 😭 FAILED: /Users/runner/work/Graphs.jl/Graphs.jl/Project.toml
The file /Users/runner/work/Graphs.jl/Graphs.jl/Project.toml is not in canonical format.

Looks like 1.6 / 1.9 have a different notion of "canonical" for [weakdeps] entries in the Project.toml

I probably need to disable the test on Julia <1.9

@simonschoelly
Copy link
Member

I don't want to be too negative here - but I am not sure if this is something we want right now. My reasons for this:

  • It will make maintaining this package more difficult. It will also make testing it more difficult.
  • It will make using it slightly more difficult as one has to explicitly import the relevant packages - unfortunately Julia's package manager does not have the abilities to declare features like Rust does for example.
  • It is unclear how much this will increase loading time.
  • Having no or very few dependencies was never the the goal of this package. In the past we removed some dependencies because some packages (like the linear optimization ones) were breaking this package during the transition from Julia v0.7 to v1.0.

If there is really the wish to have weak dependencies I am open to it - but first we should reach a consensus on that - so we should discus it in a Github issue.

@simonschoelly
Copy link
Member

Expression: result ⊜ true
Evaluated: ⟪result: 😭 FAILED: /Users/runner/work/Graphs.jl/Graphs.jl/Project.toml
The file /Users/runner/work/Graphs.jl/Graphs.jl/Project.toml is not in canonical format.

Looks like 1.6 / 1.9 have a different notion of "canonical" for [weakdeps] entries in the Project.toml

I probably need to disable the test on Julia <1.9

Disabling tests for the lower bound version cannot be an option. We could raise the lower bounds to the latest long-term support release though - that would be v1.10. We can do that in a separate PR though.

@topolarity
Copy link
Author

topolarity commented May 21, 2025

Disabling tests for the lower bound version cannot be an option. We could raise the lower bounds to the latest long-term support release though - that would be v1.10. We can do that in a separate PR though.

It's just a formatting check though, which is implemented differently on v1.6 vs. 1.9 / 1.10. The Project.toml would still be checked for canonicity on 1.10, so I don't think there's any loss of coverage by disabling the check on 1.6

@simonschoelly
Copy link
Member

I think we are due for bumping the min version of this package anyway so I created a PR: #432 - let's see if someone objects.

@Krastanov
Copy link
Member

  • It is unclear how much this will increase loading time.

Wouldn't this strictly decrease loading time? And decrease compilation time as well? It is a small change for this specific package, but for packages that depend on Graphs.jl (and other packages) this can start adding up. It is why DiffEq.jl split out all of its solvers in separate packages.

e.g. running julia> @time @eval using Graphs on 1.11.5

master: 0.540166 seconds (411.04 k allocations: 26.379 MiB, 5.20% gc time, 15.12% compilation time)

this pr: 0.502729 seconds (380.79 k allocations: 24.282 MiB, 4.67% gc time, 8.19% compilation time)

  • Having no or very few dependencies was never the the goal of this package.

But happening to have few dependencies makes the life of downstream package maintainers betters. If Graphs unconditionally depends on Distributed, then other packages that have extensions for Distributed will run precompilation for the extensions. Changes like this are overall beneficial for the user because way less unnecessary precompilation will be happening.

@thchr
Copy link
Contributor

thchr commented May 22, 2025

  • It will make maintaining this package more difficult. It will also make testing it more difficult.

How will it become more difficult?

  • Having no or very few dependencies was never the the goal of this package. In the past we removed some dependencies because some packages (like the linear optimization ones) were breaking this package during the transition from Julia v0.7 to v1.0.

There's no downside to having fewer dependencies? The fewer dependencies Graphs.jl has, the lower the barrier is to some other package deciding to take a dependency on Graphs.jl.

Comment on lines +17 to +18
[weakdeps]
SharedArrays = "1a1011a3-84de-559e-8e89-a11a2f7dc383"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't SharedArrays then be removed from the [deps] section above as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was left behind for https://pkgdocs.julialang.org/v1/creating-packages/#Transition-from-normal-dependency-to-extension, but now that we're bumping compat to 1.10+, this can indeed be removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, hadn't considered that 👍

@rafaqz
Copy link
Contributor

rafaqz commented May 22, 2025

As the maintainer of a package that uses Graphs.jl but never Distributed or SharedArrays, I support this change.

We should also move Inflate.jl to an extension, it pulls in binary dependencies that most users wont use.

@simonschoelly
Copy link
Member

simonschoelly commented May 23, 2025

  • It is unclear how much this will increase loading time.

Wouldn't this strictly decrease loading time? And decrease compilation time as well? It is a small change for this specific package, but for packages that depend on Graphs.jl (and other packages) this can start adding up. It is why DiffEq.jl split out all of its solvers in separate packages.

e.g. running julia> @time @eval using Graphs on 1.11.5

master: 0.540166 seconds (411.04 k allocations: 26.379 MiB, 5.20% gc time, 15.12% compilation time)

this pr: 0.502729 seconds (380.79 k allocations: 24.282 MiB, 4.67% gc time, 8.19% compilation time)

  • Having no or very few dependencies was never the the goal of this package.

But happening to have few dependencies makes the life of downstream package maintainers betters. If Graphs unconditionally depends on Distributed, then other packages that have extensions for Distributed will run precompilation for the extensions. Changes like this are overall beneficial for the user because way less unnecessary precompilation will be happening.

Are you running this on some kind of beefed up cluster? For me it takes around 6 to 7 seconds to import Graphs.jl after switching my branch.

Also, what does @eval do here?

I have not figured out how to invalidate all precompiled code - but when running this multiple times the loading time varies by quite a bit and sometimes one branch is faster and the other time the other. So unless you repeated this experiment multiple times I am not sure if we really have a speed up of 7.4%.

@Krastanov
Copy link
Member

The runs above were on a cold laptop that was top-of-the-line in 2019, in performance mode, and are admittedly difficult to reproduce. The runs below are on a well-cooled steady-state desktop that is not throttling. The difference between what the two of us are getting above are probably loading vs precompiling, so below I separate those.

On julia 1.12-beta3

Either way though, this testing should not matter much -- putting things in extensions inherently leads to less precompilation and less loading. The improvements can be very minimal (and they are negligible in this particular case), but the improvements should be unconditional. More importantly, this is particularly useful for downstream libraries that depend of Graphs. I like the slow push for more extensions and conditional dependencies in the ecosystem as a whole, because that makes my life as a developer of a library with many dependencies easier.

loading time comparison (no precompilation): very minor improvement in timing and allocations

this PR

> for i in (seq 1 10); julia --project=. -t1,1 -e "@time @eval using Graphs"; end
  0.358056 seconds (504.41 k allocations: 32.222 MiB, 7.62% compilation time)
  0.359722 seconds (504.41 k allocations: 32.222 MiB, 7.71% compilation time)
  0.354502 seconds (504.41 k allocations: 32.222 MiB, 7.75% compilation time)
  0.360770 seconds (504.41 k allocations: 32.237 MiB, 7.67% compilation time)
  0.361303 seconds (504.41 k allocations: 32.237 MiB, 7.52% compilation time)
  0.354667 seconds (504.41 k allocations: 32.221 MiB, 7.97% compilation time)
  0.361170 seconds (504.41 k allocations: 32.221 MiB, 7.56% compilation time)
  0.359616 seconds (504.41 k allocations: 32.221 MiB, 7.68% compilation time)
  0.372852 seconds (504.41 k allocations: 32.222 MiB, 7.37% compilation time)
  0.359655 seconds (504.41 k allocations: 32.221 MiB, 7.62% compilation time)

master:

> for i in (seq 1 10); julia --project=. -t1,1 -e "@time @eval using Graphs"; end
  0.379161 seconds (531.04 k allocations: 34.046 MiB, 7.34% compilation time)
  0.371243 seconds (531.04 k allocations: 34.045 MiB, 7.51% compilation time)
  0.373846 seconds (531.04 k allocations: 34.046 MiB, 7.47% compilation time)
  0.372231 seconds (531.04 k allocations: 34.029 MiB, 7.34% compilation time)
  0.372268 seconds (531.04 k allocations: 34.031 MiB, 7.38% compilation time)
  0.362420 seconds (531.04 k allocations: 34.031 MiB, 7.64% compilation time)
  0.362064 seconds (531.04 k allocations: 34.030 MiB, 7.99% compilation time)
  0.369258 seconds (531.04 k allocations: 34.046 MiB, 7.38% compilation time)
  0.373228 seconds (531.04 k allocations: 34.030 MiB, 7.44% compilation time)
  0.372276 seconds (531.04 k allocations: 34.030 MiB, 7.44% compilation time)

precompilation of the entire environment: very minor improvement in allocations, negligible improvement in timing

this pr:

> for i in (seq 1 10); rm -rf ~/.julia/compiled/v1.12; julia --project=. -t1,1 -e "@time @eval using Graphs" 2> /dev/null; end
 10.375091 seconds (645.68 k allocations: 47.397 MiB, 0.81% gc time, 0.41% compilation time)
 10.369979 seconds (645.81 k allocations: 47.400 MiB, 0.80% gc time, 0.41% compilation time)
 10.407871 seconds (645.76 k allocations: 47.446 MiB, 0.80% gc time, 0.40% compilation time)
 10.409445 seconds (645.81 k allocations: 47.377 MiB, 0.80% gc time, 0.41% compilation time)
 10.371197 seconds (645.68 k allocations: 47.375 MiB, 0.81% gc time, 0.41% compilation time)
 10.450528 seconds (645.60 k allocations: 47.384 MiB, 0.79% gc time, 0.40% compilation time)
 10.317081 seconds (645.69 k allocations: 47.455 MiB, 0.81% gc time, 0.41% compilation time)
 10.446430 seconds (645.61 k allocations: 47.395 MiB, 0.80% gc time, 0.41% compilation time)
 10.371907 seconds (645.68 k allocations: 47.403 MiB, 0.80% gc time, 0.44% compilation time)
 10.448137 seconds (645.85 k allocations: 47.467 MiB, 0.79% gc time, 0.41% compilation time)

master:

> for i in (seq 1 10); rm -rf ~/.julia/compiled/v1.12; julia --project=. -t1,1 -e "@time @eval using Graphs" 2> /dev/null; end
 10.422142 seconds (679.91 k allocations: 50.342 MiB, 0.79% gc time, 0.42% compilation time)
 10.335160 seconds (680.00 k allocations: 50.288 MiB, 0.81% gc time, 0.41% compilation time)
 10.314185 seconds (680.08 k allocations: 50.408 MiB, 0.80% gc time, 0.41% compilation time)
 10.343851 seconds (679.84 k allocations: 50.287 MiB, 0.83% gc time, 0.41% compilation time)
 10.635309 seconds (679.92 k allocations: 50.347 MiB, 0.79% gc time, 0.40% compilation time)
 10.638776 seconds (680.08 k allocations: 50.298 MiB, 0.80% gc time, 0.41% compilation time)
 10.688697 seconds (679.84 k allocations: 50.419 MiB, 0.78% gc time, 0.43% compilation time)
 10.375991 seconds (680.08 k allocations: 50.428 MiB, 0.80% gc time, 0.41% compilation time)
 10.341547 seconds (679.84 k allocations: 50.296 MiB, 0.80% gc time, 0.41% compilation time)
 10.335641 seconds (680.04 k allocations: 50.353 MiB, 0.82% gc time, 0.46% compilation time)

precompilation of just Graphs.jl: no improvement

this pr:

> for i in (seq 1 10); echo "#" >> Graphs.jl/src/Graphs.jl; julia --project=. -t1,1 -e "@time @eval using Graphs" 2> /dev/null; end
  3.595370 seconds (633.06 k allocations: 45.822 MiB, 2.32% gc time, 1.18% compilation time)
  3.546514 seconds (633.06 k allocations: 45.759 MiB, 2.34% gc time, 1.18% compilation time)
  3.584489 seconds (633.06 k allocations: 45.822 MiB, 2.29% gc time, 1.17% compilation time)
  3.566042 seconds (633.06 k allocations: 45.760 MiB, 2.33% gc time, 1.18% compilation time)
  3.560505 seconds (633.06 k allocations: 45.759 MiB, 2.34% gc time, 1.16% compilation time)
  3.573339 seconds (633.06 k allocations: 45.760 MiB, 2.34% gc time, 1.17% compilation time)
  3.583969 seconds (633.06 k allocations: 45.776 MiB, 2.36% gc time, 1.19% compilation time)
  3.568351 seconds (633.06 k allocations: 45.761 MiB, 2.33% gc time, 1.17% compilation time)
  3.598015 seconds (633.06 k allocations: 45.756 MiB, 2.62% gc time, 1.19% compilation time)
  3.525711 seconds (633.06 k allocations: 45.761 MiB, 2.37% gc time, 1.19% compilation time)

master:

> for i in (seq 1 10); echo "#" >> Graphs.jl/src/Graphs.jl; julia --project=. -t1,1 -e "@time @eval using Graphs" 2> /dev/null; end
  3.555798 seconds (633.07 k allocations: 45.842 MiB, 2.33% gc time, 1.19% compilation time)
  3.575576 seconds (633.06 k allocations: 45.818 MiB, 2.33% gc time, 1.19% compilation time)
  3.589061 seconds (633.07 k allocations: 45.839 MiB, 2.32% gc time, 1.18% compilation time)
  3.546660 seconds (633.06 k allocations: 45.765 MiB, 2.32% gc time, 1.19% compilation time)
  3.579585 seconds (633.06 k allocations: 45.759 MiB, 2.31% gc time, 1.21% compilation time)
  3.562644 seconds (633.06 k allocations: 45.843 MiB, 2.34% gc time, 1.18% compilation time)
  3.563017 seconds (633.06 k allocations: 45.780 MiB, 2.35% gc time, 1.20% compilation time)
  3.604094 seconds (633.06 k allocations: 45.822 MiB, 2.42% gc time, 1.18% compilation time)
  3.566834 seconds (633.06 k allocations: 45.777 MiB, 2.35% gc time, 1.20% compilation time)
  3.540509 seconds (633.06 k allocations: 45.826 MiB, 2.38% gc time, 1.24% compilation time)

The @eval is necessary to make sure all compilation work is included in the timing. I do not know the details of why, but this PR can be instructive if you want to delve deeper JuliaLang/julia#58015

@simonschoelly
Copy link
Member

@Krastanov Thanks, these are some very solid benchmarks.

@rafaqz

We should also move Inflate.jl to an extension, it pulls in binary dependencies that most users wont use.

Inflate.jl is a very small package. It only pulls in binary dependencies for tests but otherwise it is pure Julia code without any dependencies. If I remember correctly it was even created so that we can use it in this package.
We do indeed only use Inflate.jl for loadgraph and loadgraphs - but these functions are used by a lot of other packages - therefore making this a weak dependency would be a breaking change.

I do agree that there are downsides to having dependencies, for example versioning issues and the compilation of code that we never use. But there are also advantages if we don't have to implement everything ourselves.

The extension system that Julia uses is rather clumsy - it works mostly if one wants to create a specialization for a function or a type declared in another package. But when that is not the case we require users to write using SharedArrays, Distributed by hand. It would be nice if Julia would have a system like Rust where one could write something like (this is probably not valid TOML)

[compat]
Graphs =  {version: "1.8", features: ["parallel"]}

and then Graphs would add the correct packages for the parallel feature - this is unfortunately not the case - but I hope they will add something like that in the future.

We had issues with this in the past in GraphIO.jl as we have multiple extensions there. This has confused users of that package.

So I am still critical of this but a lot of people seem to think its a good idea so I would propose this:

  • We try out how this works of the threaded and distributed code - we never made these methods public so this would not be a breaking change.
  • We make sure that this is well documented so that users know that there is parallel code and they know how to use it.
  • We also might need to add some error messages in case someone tries to use the parallel code but has not imported the correct libraries.
  • Let's not touch the rest of the dependencies for now - we can always come back to that later.
  • This also does not mean that we should avoid adding dependencies in the future if it is reasonable.

@topolarity Let's merge #429 first so that the new parallel argument is independent of the other changes proposed here.

@simonschoelly
Copy link
Member

On the subject of loading time - this issue #244 reminded me that there is a @time_import macro to see which package uses how much time when loading.

julia> @time_imports using Graphs
Precompiling Graphs finished.
  16 dependencies successfully precompiled in 21 seconds. 16 already precompiled.
     13.5 ms  MacroTools
      1.2 ms  SimpleTraits
      0.8 ms  Printf
     32.8 ms  Dates
      1.0 ms  TOML
      1.0 ms  Preferences
      0.2 ms  PrecompileTools
      1.3 ms  StaticArraysCore
    228.6 ms  StaticArrays
      2.8 ms  ArnoldiMethod
      1.1 ms  Statistics
      0.3 ms  StaticArrays  StaticArraysStatisticsExt
      0.4 ms  UUIDs
      0.4 ms  Compat
      0.2 ms  Compat  CompatLinearAlgebraExt
      0.9 ms  Inflate
      6.6 ms  OrderedCollections
     78.2 ms  DataStructures
               ┌ 0.8 ms SuiteSparse_jll.__init__() 
     11.0 ms  SuiteSparse_jll 87.02% compilation time
      1.4 ms  Serialization
               ┌ 20.0 ms SparseArrays.CHOLMOD.__init__() 97.90% compilation time
    273.7 ms  SparseArrays 7.15% compilation time
      0.4 ms  Statistics  SparseArraysExt
               ┌ 0.0 ms Distributed.__init__() 
     48.0 ms  Distributed 68.80% compilation time
      0.9 ms  Mmap
      4.6 ms  SharedArrays
     22.1 ms  Graphs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants