Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split cubic & ideal EOS (and common backbone) into separate package(s) #106

Open
stillyslalom opened this issue Jul 23, 2022 · 10 comments
Open
Labels
breaking a change that requires a breaking change

Comments

@stillyslalom
Copy link
Contributor

The complexity of SAFT equations of state imposes quite a lot of overhead on this package, and that’s entirely justified for a research-grade EOS, but for simple day-to-day usage, it’d be great to have a low-overhead, fast-loading EOS package covering ideal/SRK/Peng-Robinson.

@pw0908
Copy link
Member

pw0908 commented Jul 23, 2022

Splitting Clapeyron is an idea we've been thinking about for a while. This is mainly because some of the features we've been developing are getting quite niche and often warrant a package of their own. I'm still of the mindset that it's nice to have one unified package. However, as Clapeyron grows, it becomes more apparent that we should split it up.

That being said, I doubt SAFT represents a large overhead in Clapeyron? Surely something like our database parsers and methods should contribute much more. I might be wrong though.

@longemen3000
Copy link
Member

on limited testing:

julia> @time using Revise, Clapeyron #without any EoS
  8.001558 seconds (13.22 M allocations: 797.007 MiB, 5.96% gc time, 52.49% compilation time)

julia> @time using Revise, Clapeyron #with all EoS
  8.418290 seconds (13.28 M allocations: 802.694 MiB, 6.19% gc time, 50.00% compilation time)

So, the main overhead seems to be the code before the EoS (parsing and methods)

on how to split the package, the main future solution will almost surely be something like what they do in https://github.com/JuliaArrays/ArrayInterface.jl , but we still need to be the proper hierarchy of packages to load.

@longemen3000 longemen3000 added the breaking a change that requires a breaking change label Jul 23, 2022
@longemen3000
Copy link
Member

with julia 1.8, the new macro @time_imports helps here:

julia> @time_imports using Clapeyron
      3.9 ms  StaticArraysCore
   1072.9 ms  StaticArrays
      1.9 ms  PackedVectorsOfVectors
    299.1 ms  FillArrays
      0.3 ms  CommonSolve
     57.6 ms  MacroTools
      2.0 ms  ConstructionBase
     56.6 ms  Setfield
     93.1 ms  Roots
      1.5 ms  PositiveFactorizations
     33.0 ms  NLSolvers
      4.0 ms  DataAPI
      0.6 ms  Compat
     17.4 ms  OrderedCollections
    112.6 ms  DataStructures
      0.7 ms  SortingAlgorithms
     15.0 ms  Missings
      9.5 ms  DocStringExtensions 57.54% compilation time
    162.9 ms  ChainRulesCore
      1.2 ms  ChangesOfVariables
      1.9 ms  InverseFunctions
     14.9 ms  IrrationalConstants
      1.4 ms  LogExpFunctions
      0.6 ms  StatsAPI
     46.9 ms  StatsBase
     36.2 ms  PDMats
      0.5 ms  Reexport
      0.5 ms  OpenLibm_jll
     81.5 ms  Preferences
      0.7 ms  JLLWrappers
      1.4 ms  CompilerSupportLibraries_jll
    412.9 ms  OpenSpecFun_jll 98.13% compilation time (99% recompilation)
     30.3 ms  SpecialFunctions
      1.5 ms  Rmath_jll
    198.0 ms  Rmath 91.84% compilation time
      0.6 ms  NaNMath
      3.3 ms  Calculus
     70.7 ms  DualNumbers
      1.4 ms  HypergeometricFunctions
     16.9 ms  StatsFuns
      9.4 ms  QuadGK
      2.8 ms  DensityInterface
    301.4 ms  Distributions
     25.1 ms  SpatialIndexing
      0.3 ms  CPUTime
     14.0 ms  URIs 55.69% compilation time
     10.1 ms  MbedTLS_jll
     93.9 ms  MbedTLS 60.69% compilation time
      0.7 ms  IniFile
    130.8 ms  HTTP 42.82% compilation time
    192.0 ms  Parsers 4.61% compilation time
     38.3 ms  JSON
     99.6 ms  BlackBoxOptim
      5.8 ms  DiffResults
      1.2 ms  DiffRules
      0.5 ms  CommonSubexpressions
    361.3 ms  ForwardDiff
      0.4 ms  Scratch
    872.3 ms  Unitful
      0.3 ms  DataValueInterfaces
      0.3 ms  IteratorInterfaceExtensions
      0.2 ms  TableTraits
     39.8 ms  Tables
     41.5 ms  PooledArrays
    467.7 ms  SentinelArrays 21.74% compilation time
     29.0 ms  InlineStrings
     46.9 ms  WeakRefStrings
      4.2 ms  TranscodingStreams
      0.5 ms  Zlib_jll
      4.5 ms  CodecZlib
     22.7 ms  FilePathsBase 20.75% compilation time
   3340.7 ms  CSV 80.23% compilation time (85% recompilation)
     13.4 ms  ThermoState
    270.6 ms  Clapeyron

worst offenders:

   3340.7 ms  CSV 80.23% compilation time (85% recompilation)
   1072.9 ms  StaticArrays
   872.3 ms  Unitful

i'm following https://sciml.ai/news/2022/09/21/compile_time/ btw. so it seems that most of our first time to compile is due to CSV

@stillyslalom
Copy link
Contributor Author

Chris’s blog post today mentioned changing import order to put overload-heavy packages like StaticArrays later to reduce the number of methods that need to be checked when loading other dependencies. I haven’t played around with import order, and it’s a bit painful to trawl through dependency trees to figure out where method overloads are coming from, but moving StaticArrays down the list would be an easy first thing to try.

@longemen3000
Copy link
Member

yes, i'm playing with the package loading order to see the most efficient way. i found one nasty invalidation when loading any JLL after StaticArrays:

julia> @time_imports using StaticArrays,Rmath
      5.5 ms  StaticArraysCore
    753.7 ms  StaticArrays
     26.2 ms  Preferences
      0.6 ms  JLLWrappers
    452.8 ms  Rmath_jll 99.65% compilation time (99% recompilation)
    119.3 ms  Rmath 89.93% compilation time

@stillyslalom
Copy link
Contributor Author

Wild! Rmath_jll load time is cut from 405 to 5 ms on my machine with the load order reversed. I didn't expect such a large impact.

@longemen3000
Copy link
Member

longemen3000 commented Sep 22, 2022

There are two types of work to be done:

  • package wise, find the combination and order that minimizes load time.
  • code wise, reduce the amount of invalidations and ambiguities caused by our own code.
    We don't invalidate that much (because we define new methods rather that using other ones) but we have quite a bit of ambiguities

One quick way to reduce package loading by half would be to remove CSV.jl and just use DelimitedFiles.jl? We use the Tables.jl interface to iterate over the parsed values, so anything Tables.jl compliant could work

@longemen3000
Copy link
Member

One idea is to bypass the table parsing directly and pass the params yourself:

model = PR(components;Tc, Pc, acentric_factor, k)

That would skip the compile first time of the DB parser completely, at the cost of needing to specify your params yourself

@pw0908
Copy link
Member

pw0908 commented Nov 15, 2022

I wonder if we should allow users to define parameters manually for all our models.

That seems to be the way all other packages like ours work. Ive always found that really tedious but it's definitely easier than the csvs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking a change that requires a breaking change
Projects
None yet
Development

No branches or pull requests

3 participants