Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ag/pr vectorized second derivative volume #402

Merged
merged 11 commits into from
Apr 15, 2024

Conversation

andrewgiuliani
Copy link
Contributor

@andrewgiuliani andrewgiuliani commented Apr 10, 2024

This PR introduces a vectorized implementation of the second derivative of volume wrt to surface dofs calculation. This is useful for the BoozerLS calculation when volume is used as a surface label. Running the timing test:

import time
stellsym = True
nfp = 4
for mm in range(1, 15):
    mpol = mm
    ntor = mm
    phis = np.linspace(0, 1/nfp, 2*ntor+1, endpoint=False)
    thetas = np.linspace(0, 1, 2*mpol+1, endpoint=False)
    t0 = time.time()
    s = SurfaceXYZTensorFourier(
        nfp=nfp, stellsym=stellsym, mpol=mpol, ntor=ntor,
        quadpoints_phi=phis, quadpoints_theta=thetas)
    d2vol = s.d2volume_by_dcoeffdcoeff()
    t1 = time.time()
    print(mm, t1-t0)

I obtain the following timings for the original implementation:

mpol&ntor, time (s)
1 0.001640
2 0.003978
3 0.034043
4 0.108703
5 0.324571
6 0.867548
7 2.053304
8 4.371564
9 8.561552
10 15.6180
11 27.0530
12 out of memory
13 out of memory
14 out of memory

I run out of RAM for mpol, ntor >= 12. This is due (in part) to the line that computes d2nor_dcdc:

auto d2nor_dcdc = this->d2normal_by_dcoeffdcoeff(); // uses a lot of memory for moderate surface complexity

This array stores the second derivative of the normal vector at all the surface quadrature points, totalling 3*nquadpoints_phi*nquadpoints_theta*ndofs*ndofs doubles. In the new vectorized implementation, I compute the entries of d2nor_dcdc on the fly, which alleviates memory requirements. Timings for the new vectorized implementation with a simd batchsize of 8 is:

mpol&ntor, time (s)
1 0.026728
2 0.005802
3 0.002354
4 0.004601
5 0.012529
6 0.040148
7 0.065161
8 0.123145
9 0.231217
10 0.41314
11 0.69693
12 1.10758
13 1.75156
14 2.73922

The original implementation is still used when xsimd is not available. I have not vectorized d2area_by_dcoeffdcoeff, because I typically do not use area as a surface label in the BoozerLS algorithm. However, it suffers from issues as mpol, ntor increase too.

Copy link

codecov bot commented Apr 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.58%. Comparing base (3d1413b) to head (d6fab6a).
Report is 11 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #402      +/-   ##
==========================================
- Coverage   91.64%   91.58%   -0.06%     
==========================================
  Files          74       74              
  Lines       12987    12911      -76     
==========================================
- Hits        11902    11825      -77     
- Misses       1085     1086       +1     
Flag Coverage Δ
unittests 91.58% <ø> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@andrewgiuliani andrewgiuliani merged commit 38a20cb into master Apr 15, 2024
46 of 47 checks passed
@andrewgiuliani andrewgiuliani deleted the ag/pr_vectorized_second_derivative_volume branch April 15, 2024 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants