Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Final switch to IMASDD #32

Merged
merged 6 commits into from
Apr 2, 2024
Merged

Final switch to IMASDD #32

merged 6 commits into from
Apr 2, 2024

Conversation

anchal-physics
Copy link
Collaborator

@anchal-physics anchal-physics commented Mar 25, 2024

This PR would be merged simultaneously when all other repos also have the PR ready for the final switch from OMAS to IMASDD.

Speed test

I ran a speed test to check the difference between using OMAS and IMASDD. With IMASDD, solps2imas is about 100 times slower!. I believe we need to address this, or atleast confirm that this was expected before marking this PR ready above.

Following are the results:

Speed test with IMASDD.jl

SOLPS2IMAS.jl % julia --project test/speedtest.jl 
-----------------------------------------------------------------------------
solps2imas(b2gmtry, b2output; b2mn=b2mn, fort=fort) time with compilation: 184.201234 seconds (675.54 M allocations: 19.762 GiB, 3.58% gc time, 28.76% compilation time: <1% of which was recompilation)
solps2imas(b2gmtry, b2output; b2mn=b2mn, fort=fort) time (true runtime): 131.170108 seconds (510.00 M allocations: 8.549 GiB, 2.25% gc time)

Speed test with OMAS.jl

SOLPS2IMAS.jl % julia --project test/speedtest.jl                               
-----------------------------------------------------------------------------
solps2imas(b2gmtry, b2output; b2mn=b2mn, fort=fort) time with compilation:   8.625094 seconds (28.36 M allocations: 1.872 GiB, 6.44% gc time, 82.90% compilation time: <1% of which was recompilation)
solps2imas(b2gmtry, b2output; b2mn=b2mn, fort=fort) time (true runtime):   1.392322 seconds (9.37 M allocations: 659.444 MiB, 12.51% gc time)

@anchal-physics
Copy link
Collaborator Author

Attaching profview results for the two versions of solps2imas:

solps2imas_profview_results.zip

It has two files:

  • solps2imas_omas_profview.jlprof is profview of solps2imas when ran at ea4480a with OMAS.jl
  • solps2imas_imasdd_profview.jlprof is profview of solps2imas when ran at ba5109f with IMASDD.jl

To view these files, open julia REPL:

> using Pkg; Pkg.add("ProfileView");
> using ProfileView
> ProfileView.view(nothing, windowname="OMAS") # Use GUI to open solps2imas_omas_profview.jlprof
> ProfileView.view(nothing, windowname="IMASDD") # Use GUI to open solps2imas_imasdd_profview.jlprof
> # Keep REPL open

Highlights

  • With OMAS the run time is a few seconds while with IMASDD it is about 2 minutes.
  • The majority of time spent by both versions is at functions search_points and search_edges which try to match a node or an edge with existing nodes or edges respectively, while filling the grid.
  • However, with OMAS, these functions do not take significant time while they take a lot of time in IMASDD.
  • In particular, with IMASDD, most of the time is spent in function Base.get_property() inside which the majority time is spent in the helper private function Base._get_property().
  • At the bottom most level, most time is spent at line 294 of IMASDD/base.jl where the function is checking if the requested property belongs to private_fields.
    • This does not make sense because private_fields is just a 6 element list and this search should be very fast.
    • So maybe I am making a mistake in understanding the profile view.
    • It would be great to get some help on this @orso82
  • Meanwhile, I'll try to make search_points and search_edges more efficient if possible

@anchal-physics
Copy link
Collaborator Author

I was able to increase the speed by a factor of 2 by simply reducing the number of calls that would be made to get_property inside search_points and search_edges function. Speed result now:

SOLPS2IMAS.jl % julia --project test/speedtest.jl
-----------------------------------------------------------------------------
solps2imas(b2gmtry, b2output; b2mn=b2mn, fort=fort) time with compilation: 139.745324 seconds (433.19 M allocations: 16.150 GiB, 4.25% gc time, 40.88% compilation time: <1% of which was recompilation)
solps2imas(b2gmtry, b2output; b2mn=b2mn, fort=fort) time (true runtime):  74.138648 seconds (267.65 M allocations: 4.938 GiB, 2.12% gc time, 0.02% compilation time)

It is still not an acceptable speed. There might not be any more optimization possible in search_edges and search_nodes. These functions are operating on unsorted lists, some kdtree stuff can be maybe, but that would be too much complex. Hopefully, we can improve the speed from IMASDD side in get_property function. Maybe there is a better way to get the property of a structure which can reduce the time spent in get_property function.

@orso82
Copy link
Member

orso82 commented Mar 26, 2024

@bclyons12 if you have time, could you please take a look and see if there's anything obvious?

@orso82
Copy link
Member

orso82 commented Mar 26, 2024

@anchal-physics the GGD is highly nested and depth really stress tests the access to the data structure.

For now I have resorted to simply using getfield(bla, :field) instead of bla.field to get faster access in few selected places. This brings the execution to ~6s on my machine.

Of course doing so removes the IMASDD functionalities, but that's ok for the ggd, where I don't think we'll want to use expressions, lazy copies, and handling of time dependent data.

I may consider defining the nodes under the ggd structures as a different abstract subtype of IDS, on which I can dispatch to get faster operations.

@bclyons12
Copy link
Member

I don't see anything at first glance. I'm assuming it's related to some of the allocation issues we've seen in IMASDD elsewhere, but I'd need to do some profiling myself to confirm that. I could look into it more in a week or two.

@anchal-physics
Copy link
Collaborator Author

Thanks @orso82. After your change, the execution time on my machine was about 8.5 seconds. I found that half of the time was spent in a deepcopy of grid_subset operation, so I made a faster version of deepcopy in GGDUtils that uses getfield and uses deepcopy for the julia base types only. This improved the speed further and now the run time is 3.5s only.

There is still a fact to note that the compilation time with IMASDD is still a lot in comparison to OMAS (7 times more). Maybe that is just the way julia is.

Here is the summary of speedtest now:

Speed at 557b60e

SOLPS2IMAS.jl % julia --project test/speedtest.jl
-----------------------------------------------------------------------------
solps2imas(b2gmtry, b2output; b2mn, fort) time with compilation:  93.010971 seconds (187.08 M allocations: 12.482 GiB, 8.50% gc time, 90.99% compilation time: <1% of which was recompilation)
solps2imas(b2gmtry, b2output; b2mn, fort) time (true runtime):   8.574597 seconds (21.54 M allocations: 1.271 GiB, 10.96% gc time, 0.25% compilation time)

Speed at ProjectTorreyPines/IMASggd.jl@3ebbf8f and 11ac45a

SOLPS2IMAS.jl % julia --project test/speedtest.jl
-----------------------------------------------------------------------------
solps2imas(b2gmtry, b2output; b2mn, fort) time with compilation:  55.555791 seconds (172.91 M allocations: 11.503 GiB, 8.39% gc time, 93.66% compilation time)
solps2imas(b2gmtry, b2output; b2mn, fort) time (true runtime):   3.405274 seconds (18.18 M allocations: 1.003 GiB, 12.79% gc time, 0.37% compilation time)

I think it is acceptable speed now, except for the fact that first time use of solps2imas in any code will still be slow. I'll move forward to other repos now.

Question: Is it possible to precompile a julia script and create a executable that runs at the fastest speed?

@orso82
Copy link
Member

orso82 commented Mar 27, 2024

@anchal-physics tihs is what we should do to speedup precompilation times on all of our packages:
https://julialang.github.io/PrecompileTools.jl/stable/

anchal-physics and others added 6 commits March 28, 2024 17:06
* All references to OMAS have been removed now.
* All tests pass with IMASDD.
* Minor changes in edge and cell initialization was required.
It was identified that majority of time is spent in get_property calls
made due search_points and search_edges. Ithis commit reduces these
calls by making a single clal per iteration of loop and storing the
result in a temporary variable. This improved the speed by a factor of
about 2.

Before commit
SOLPS2IMAS.jl % julia --project test/speedtest.jl
-----------------------------------------------------------------------------
solps2imas(b2gmtry, b2output; b2mn=b2mn, fort=fort) time with compilation: 139.745324 seconds (433.19 M allocations: 16.150 GiB, 4.25% gc time, 40.88% compilation time: <1% of which was recompilation)
solps2imas(b2gmtry, b2output; b2mn=b2mn, fort=fort) time (true runtime):  74.138648 seconds (267.65 M allocations: 4.938 GiB, 2.12% gc time, 0.02% compilation time)

After commit

SOLPS2IMAS.jl % julia --project test/speedtest.jl
-----------------------------------------------------------------------------
solps2imas(b2gmtry, b2output; b2mn=b2mn, fort=fort) time with compilation: 139.745324 seconds (433.19 M allocations: 16.150 GiB, 4.25% gc time, 40.88% compilation time: <1% of which was recompilation)
solps2imas(b2gmtry, b2output; b2mn=b2mn, fort=fort) time (true runtime):  74.138648 seconds (267.65 M allocations: 4.938 GiB, 2.12% gc time, 0.02% compilation time)
After last commits, it was found that deepcopy of the grid_subsets
done when triangular mesh is read is taking up about half of the
computation time of solps2imas(). This is because deepcopy has
to go through the deep grid_subset strucutre calling get_property at
every instance.
This commit uses custom deepcopy_subset function which is aware of
the fields inside grid_subset and uses getfield() for faster access
to items for creating deepcopy at julia base type level. This function
has been added in GGDutils.jl/subset_tools.jl at ProjectTorreyPines/IMASggd.jl@3ebbf8f

With this change, the speed has improved by a factor of 2.5. deepcopy is
not a major bottleneck anymore.
Copy link
Collaborator

@eldond eldond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, it passes, but it would be nice if it were faster:

julia> include("test/runtests.jl")
Test Summary:          | Pass  Total  Time
Test index conversions |    1      1  0.1s
Test Summary:              | Pass  Total  Time
Test file parsing in depth |    6      6  1.3s
Test Summary:       | Pass  Total  Time
Test read_b2_output |   19     19  3.3s
solps2imas() time: 160.382650 seconds (154.93 M allocations: 10.483 GiB, 5.57% gc time, 96.87% compilation time)
Test Summary:                        | Pass  Total     Time
Test solps2imas() (overall workflow) |    4      4  2m42.0s
Test Summary:                                   |  Pass  Total   Time
Test triangular mesh generation from fort files | 23479  23479  11.7s
Test.DefaultTestSet("Test triangular mesh generation from fort files", Any[], 23479, false, false, true, 1.712016072012787e9, 1.712016083762666e9, false)

julia> 

@anchal-physics
Copy link
Collaborator Author

Yeah, as Orso suggested, we might have to use PrecompileTools to make our packages faster. Most time spent in those tests is compilation time. Those functions will run much faster the second time in the same julia instance.

@eldond
Copy link
Collaborator

eldond commented Apr 2, 2024

They sure do

@anchal-physics anchal-physics merged commit b20c48b into dev Apr 2, 2024
1 check passed
@eldond eldond deleted the OMAS_IMAS branch April 2, 2024 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants