Split cubic & ideal EOS (and common backbone) into separate package(s) #106

stillyslalom · 2022-07-23T00:09:38Z

The complexity of SAFT equations of state imposes quite a lot of overhead on this package, and that’s entirely justified for a research-grade EOS, but for simple day-to-day usage, it’d be great to have a low-overhead, fast-loading EOS package covering ideal/SRK/Peng-Robinson.

pw0908 · 2022-07-23T01:53:07Z

Splitting Clapeyron is an idea we've been thinking about for a while. This is mainly because some of the features we've been developing are getting quite niche and often warrant a package of their own. I'm still of the mindset that it's nice to have one unified package. However, as Clapeyron grows, it becomes more apparent that we should split it up.

That being said, I doubt SAFT represents a large overhead in Clapeyron? Surely something like our database parsers and methods should contribute much more. I might be wrong though.

longemen3000 · 2022-07-23T03:19:17Z

on limited testing:

julia> @time using Revise, Clapeyron #without any EoS
  8.001558 seconds (13.22 M allocations: 797.007 MiB, 5.96% gc time, 52.49% compilation time)

julia> @time using Revise, Clapeyron #with all EoS
  8.418290 seconds (13.28 M allocations: 802.694 MiB, 6.19% gc time, 50.00% compilation time)

So, the main overhead seems to be the code before the EoS (parsing and methods)

on how to split the package, the main future solution will almost surely be something like what they do in https://github.com/JuliaArrays/ArrayInterface.jl , but we still need to be the proper hierarchy of packages to load.

longemen3000 · 2022-09-22T00:15:31Z

with julia 1.8, the new macro @time_imports helps here:

julia> @time_imports using Clapeyron
      3.9 ms  StaticArraysCore
   1072.9 ms  StaticArrays
      1.9 ms  PackedVectorsOfVectors
    299.1 ms  FillArrays
      0.3 ms  CommonSolve
     57.6 ms  MacroTools
      2.0 ms  ConstructionBase
     56.6 ms  Setfield
     93.1 ms  Roots
      1.5 ms  PositiveFactorizations
     33.0 ms  NLSolvers
      4.0 ms  DataAPI
      0.6 ms  Compat
     17.4 ms  OrderedCollections
    112.6 ms  DataStructures
      0.7 ms  SortingAlgorithms
     15.0 ms  Missings
      9.5 ms  DocStringExtensions 57.54% compilation time
    162.9 ms  ChainRulesCore
      1.2 ms  ChangesOfVariables
      1.9 ms  InverseFunctions
     14.9 ms  IrrationalConstants
      1.4 ms  LogExpFunctions
      0.6 ms  StatsAPI
     46.9 ms  StatsBase
     36.2 ms  PDMats
      0.5 ms  Reexport
      0.5 ms  OpenLibm_jll
     81.5 ms  Preferences
      0.7 ms  JLLWrappers
      1.4 ms  CompilerSupportLibraries_jll
    412.9 ms  OpenSpecFun_jll 98.13% compilation time (99% recompilation)
     30.3 ms  SpecialFunctions
      1.5 ms  Rmath_jll
    198.0 ms  Rmath 91.84% compilation time
      0.6 ms  NaNMath
      3.3 ms  Calculus
     70.7 ms  DualNumbers
      1.4 ms  HypergeometricFunctions
     16.9 ms  StatsFuns
      9.4 ms  QuadGK
      2.8 ms  DensityInterface
    301.4 ms  Distributions
     25.1 ms  SpatialIndexing
      0.3 ms  CPUTime
     14.0 ms  URIs 55.69% compilation time
     10.1 ms  MbedTLS_jll
     93.9 ms  MbedTLS 60.69% compilation time
      0.7 ms  IniFile
    130.8 ms  HTTP 42.82% compilation time
    192.0 ms  Parsers 4.61% compilation time
     38.3 ms  JSON
     99.6 ms  BlackBoxOptim
      5.8 ms  DiffResults
      1.2 ms  DiffRules
      0.5 ms  CommonSubexpressions
    361.3 ms  ForwardDiff
      0.4 ms  Scratch
    872.3 ms  Unitful
      0.3 ms  DataValueInterfaces
      0.3 ms  IteratorInterfaceExtensions
      0.2 ms  TableTraits
     39.8 ms  Tables
     41.5 ms  PooledArrays
    467.7 ms  SentinelArrays 21.74% compilation time
     29.0 ms  InlineStrings
     46.9 ms  WeakRefStrings
      4.2 ms  TranscodingStreams
      0.5 ms  Zlib_jll
      4.5 ms  CodecZlib
     22.7 ms  FilePathsBase 20.75% compilation time
   3340.7 ms  CSV 80.23% compilation time (85% recompilation)
     13.4 ms  ThermoState
    270.6 ms  Clapeyron

worst offenders:

   3340.7 ms  CSV 80.23% compilation time (85% recompilation)
   1072.9 ms  StaticArrays
   872.3 ms  Unitful

i'm following https://sciml.ai/news/2022/09/21/compile_time/ btw. so it seems that most of our first time to compile is due to CSV

stillyslalom · 2022-09-22T06:34:51Z

Chris’s blog post today mentioned changing import order to put overload-heavy packages like StaticArrays later to reduce the number of methods that need to be checked when loading other dependencies. I haven’t played around with import order, and it’s a bit painful to trawl through dependency trees to figure out where method overloads are coming from, but moving StaticArrays down the list would be an easy first thing to try.

longemen3000 · 2022-09-22T21:33:05Z

yes, i'm playing with the package loading order to see the most efficient way. i found one nasty invalidation when loading any JLL after StaticArrays:

julia> @time_imports using StaticArrays,Rmath
      5.5 ms  StaticArraysCore
    753.7 ms  StaticArrays
     26.2 ms  Preferences
      0.6 ms  JLLWrappers
    452.8 ms  Rmath_jll 99.65% compilation time (99% recompilation)
    119.3 ms  Rmath 89.93% compilation time

stillyslalom · 2022-09-22T21:44:49Z

Wild! Rmath_jll load time is cut from 405 to 5 ms on my machine with the load order reversed. I didn't expect such a large impact.

longemen3000 · 2022-09-22T22:06:13Z

There are two types of work to be done:

package wise, find the combination and order that minimizes load time.
code wise, reduce the amount of invalidations and ambiguities caused by our own code.
We don't invalidate that much (because we define new methods rather that using other ones) but we have quite a bit of ambiguities

One quick way to reduce package loading by half would be to remove CSV.jl and just use DelimitedFiles.jl? We use the Tables.jl interface to iterate over the parsed values, so anything Tables.jl compliant could work

longemen3000 · 2022-09-23T21:07:16Z

problems found:

longemen3000 · 2022-11-15T01:47:10Z

One idea is to bypass the table parsing directly and pass the params yourself:

model = PR(components;Tc, Pc, acentric_factor, k)

That would skip the compile first time of the DB parser completely, at the cost of needing to specify your params yourself

pw0908 · 2022-11-15T04:56:35Z

I wonder if we should allow users to define parameters manually for all our models.

That seems to be the way all other packages like ours work. Ive always found that really tedious but it's definitely easier than the csvs

longemen3000 added the breaking a change that requires a breaking change label Jul 23, 2022

longemen3000 mentioned this issue Mar 29, 2023

allow passing a NamedTuple to userlocations kwarg #156

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split cubic & ideal EOS (and common backbone) into separate package(s) #106

Split cubic & ideal EOS (and common backbone) into separate package(s) #106

stillyslalom commented Jul 23, 2022

pw0908 commented Jul 23, 2022

longemen3000 commented Jul 23, 2022

longemen3000 commented Sep 22, 2022

stillyslalom commented Sep 22, 2022

longemen3000 commented Sep 22, 2022

stillyslalom commented Sep 22, 2022

longemen3000 commented Sep 22, 2022 •

edited

Loading

longemen3000 commented Sep 23, 2022 •

edited

Loading

longemen3000 commented Nov 15, 2022

pw0908 commented Nov 15, 2022

Split cubic & ideal EOS (and common backbone) into separate package(s) #106

Split cubic & ideal EOS (and common backbone) into separate package(s) #106

Comments

stillyslalom commented Jul 23, 2022

pw0908 commented Jul 23, 2022

longemen3000 commented Jul 23, 2022

longemen3000 commented Sep 22, 2022

stillyslalom commented Sep 22, 2022

longemen3000 commented Sep 22, 2022

stillyslalom commented Sep 22, 2022

longemen3000 commented Sep 22, 2022 • edited Loading

longemen3000 commented Sep 23, 2022 • edited Loading

longemen3000 commented Nov 15, 2022

pw0908 commented Nov 15, 2022

longemen3000 commented Sep 22, 2022 •

edited

Loading

longemen3000 commented Sep 23, 2022 •

edited

Loading