Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Regression in Performance after Installing Firedrake #3030

Closed
ryan-david-murphy opened this issue Jul 18, 2023 · 20 comments
Closed

BUG: Regression in Performance after Installing Firedrake #3030

ryan-david-murphy opened this issue Jul 18, 2023 · 20 comments

Comments

@ryan-david-murphy
Copy link

Description:
I encountered a performance regression after successfully installing Firedrake. Although the installation was completed without errors (after pinning the cython version to 0.29.36), the performance has noticeably dropped.

Steps to Reproduce:

  1. Install Firedrake using the installation script.
  2. Run a sample code (e.g., helmholtz.py) multiple times to observe the performance.

Expected Behaviour:
The performance should be consistent or improved compared to the previous environment.

Actual Behavior:
The performance has significantly dropped after installing Firedrake.

Environment:
Operating System: MacOS 13.4.1
Python Version: 3.10.8
Firedrake Version: 0.13.0+5767.g32bda80fc

@JDBetteridge
Copy link
Member

Uploading so we don't lose this:
foo

@JDBetteridge
Copy link
Member

Assuming this is still an issue, could you try a separate fresh install (please download the latest version of the install script!) now that we have pinned Cython. It's possible that the slow init is a result of some packages being installed with latest Cython and some with older Cython.

I have not been able to reproduce this issue locally.

@ryan-david-murphy
Copy link
Author

ryan-david-murphy commented Jul 20, 2023

I have reinstalled using the updated firedrake-install script after I completely removed the previous venv. I have also uninstalled and reinstalled homebrew and then completed a further reinstallation. The same performance issue is present.

I have run helmholtz.py (with graph plotting removed) using both my M1 Max and a Linux Workstation (3 month+ old venv) for comparison. I have attached the profiles. They are usually of comparable performance.

Is there anything else I can reinstall to enable a fresh implementation?

fooLinux
fooM1Max

@JDBetteridge
Copy link
Member

If you update (or do a fresh install) on the Linux workstation do you also see the performance regression? If you don't want to risk losing the old performant venv you can use firedrake-install --venv-name somthing_unique. If the Linux workstation is fine I will add the Mac tag and get some of our Mac developers to investigate.

@JDBetteridge
Copy link
Member

I will say that the profiles do look very similar to a first run (doing code gen) vs second run (using cached code).

The Helmholtz example (in the demos directory) is also very small, only a 10x10 grid with CG1 elements. To get meaningful profiling data we need to increase the number of dofs. Maybe you could add some timings?

I have attached an example profiling test on my desktop along with its output for comparison:

test_script.sh:

#!/bin/bash

# Clean caches
firedrake-clean

# Create a minimal Helmholtz problem (without plotting)
cat <<EOF >minimal_helmholtz.py
from firedrake import *

mesh = UnitSquareMesh(10, 10)

V = FunctionSpace(mesh, "CG", 1)
u = TrialFunction(V)
v = TestFunction(V)

f = Function(V)
x, y = SpatialCoordinate(mesh)
f.interpolate((1+8*pi*pi)*cos(x*pi*2)*cos(y*pi*2))

a = (inner(grad(u), grad(v)) + inner(u, v)) * dx
L = inner(f, v) * dx

u = Function(V)

solve(a == L, u, solver_parameters={'ksp_type': 'cg', 'pc_type': 'none'})

File("helmholtz.pvd").write(u)

f.interpolate(cos(x*pi*2)*cos(y*pi*2))
print(sqrt(assemble(dot(u - f, u - f) * dx)))
EOF

# Time and profile minimal Helmholtz
echo "10x10 cold cache"
time python minimal_helmholtz.py -log_view :no_cache_profile.txt:ascii_flamegraph
flamegraph.pl no_cache_profile.txt > no_cache_profile.svg

# Time and profile minimal Helmholtz with hot cache
echo "10x10 hot cache"
time python minimal_helmholtz.py -log_view :hot_cache_profile.txt:ascii_flamegraph
flamegraph.pl hot_cache_profile.txt > hot_cache_profile.svg

# Increase problem size
sed -i "s/(10, 10)/(1000, 1000)/g" minimal_helmholtz.py

# Run bigger problem
echo "1000x1000 hot cache"
time python minimal_helmholtz.py -log_view :big_hot_cache_profile.txt:ascii_flamegraph
flamegraph.pl big_hot_cache_profile.txt > big_hot_cache_profile.svg

output:

$ ./test_script.sh 
/home/jack/Documents/firedrake/firedrake/bin/firedrake-clean:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  __import__('pkg_resources').require('firedrake==0.13.0+5774.g3fb16ad47.dirty')
Removing cached TSFC kernels from /home/jack/Documents/firedrake/firedrake/.cache/tsfc
Removing cached PyOP2 code from /home/jack/Documents/firedrake/firedrake/.cache/pyop2
Removing cached pytools files from /home/jack/.cache/pytools
10x10 cold cache
0.06257073749110136

real	0m4.426s
user	0m4.085s
sys	0m0.333s
10x10 hot cache
0.06257073749110136

real	0m1.387s
user	0m1.218s
sys	0m0.155s
1000x1000 hot cache
7.078431517196732e-06

real	1m9.689s
user	0m41.024s
sys	0m28.647s

Cold cache:
no_cache_profile
Hot cache:
hot_cache_profile
Big problem:
big_hot_cache_profile

@JDBetteridge JDBetteridge self-assigned this Jul 21, 2023
@JDBetteridge
Copy link
Member

@rdm4317 any update?

@ryan-david-murphy
Copy link
Author

ryan-david-murphy commented Jul 26, 2023

@JDBetteridge I have run the requested profiles, here are the results:

Mac:

10x10 cold cache
real 0m22.615s
user 0m4.705s
sys 0m2.675s

10x10 hot cache
real 0m4.334s
user 0m1.390s
sys 0m0.868s

1000x1000 hot cache
real 0m38.126s
user 0m35.405s
sys 0m1.232s

10x10 cold cache
no_cache_profile

10x10 hot cache
hot_cache_profile

1000x1000 hot cache
big_hot_cache_profile

---------------------------------------------------------------------------
|Package             |Branch                        |Revision  |Modified  |
---------------------------------------------------------------------------
|FInAT               |master                        |47f6c37   |False     |
|PyOP2               |master                        |d230953b  |False     |
|fiat                |master                        |8c66270   |False     |
|firedrake           |master                        |3fb16ad47 |False     |
|h5py                |firedrake                     |6cc4c912  |False     |
|libspatialindex     |master                        |4768bf3   |True      |
|libsupermesh        |master                        |b145b65   |False     |
|loopy               |main                          |8158afdb  |False     |
|petsc               |firedrake                     |9364cb008b|False     |
|pyadjoint           |master                        |0378c81   |False     |
|pytest-mpi          |main                          |a478bc8   |False     |
|slepc               |firedrake                     |e438e4993 |False     |
|tsfc                |master                        |6f72c9c   |False     |
|ufl                 |master                        |3c62318c  |False     |

Linux WS:

10x10 cold cache
real 0m6.582s
user 0m5.707s
sys 0m0.884s

10x10 hot cache
real 0m2.300s
user 0m1.768s
sys 0m0.554s

1000x1000 hot cache
real 0m51.228s
user 0m49.202s
sys 0m1.973s

10x10 cold cache
no_cache_profile

10x10 hot cache
hot_cache_profile

1000x1000 hot cache
big_hot_cache_profile

---------------------------------------------------------------------------
|Package             |Branch                        |Revision  |Modified  |
---------------------------------------------------------------------------
|COFFEE              |master                        |70c1e66   |False     |
|FInAT               |master                        |cd1d528   |False     |
|PyOP2               |master                        |59e109eb  |False     |
|fiat                |master                        |a305398   |False     |
|firedrake           |master                        |284a1104a |False     |
|h5py                |firedrake                     |6cc4c912  |False     |
|libspatialindex     |master                        |4768bf3   |True      |
|libsupermesh        |master                        |69012e5   |False     |
|loopy               |main                          |3988272b  |False     |
|petsc               |firedrake                     |9364cb008b|False     |
|pyadjoint           |master                        |c691737   |False     |
|pytest-mpi          |main                          |a478bc8   |False     |
|tsfc                |master                        |e68bd28   |False     |
|ufl                 |master                        |772485d7  |False     |
---------------------------------------------------------------------------

@ryan-david-murphy
Copy link
Author

ryan-david-murphy commented Jul 26, 2023

For a simple hyperelasticity example, I am getting different TSFC behaviours.

Mac:

0
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
1
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
2
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
3
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
4
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
5
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
6
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
7
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
8
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
9
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)

Linux:

0
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
1
2
3
4
5
6
7
8
9

Here is the code:

from firedrake import *

spatialDimensions = 2
lx = 8
ly = 1
nx = 320
ny = 40
mesh = RectangleMesh(nx, ny, lx, ly, quadrilateral=True)

# function spaces
A = FunctionSpace(mesh, "CG", 1)
P = VectorFunctionSpace(mesh, "CG", 1)

# boundary conditions
bcs = [DirichletBC(P.sub(0), Constant(0), 1),
       DirichletBC(P.sub(1), Constant(0), 1)]

# Define functions
du = TrialFunction(P)            # Incremental displacement
v  = TestFunction(P)             # Test function
u  = Function(P)                 # Displacement from previous iteration
B  = Constant((0.0, -0.0))  # Body force per unit volume
T  = Constant((0.1,  0.0))  # Traction force on the boundary

for i in range(10):
    print(i)
    
    # Kinematics
    I = Identity(2)             # Identity tensor
    F = I + grad(u)             # Deformation gradient
    C = F.T*F                   # Right Cauchy-Green tensor
    
    # Invariants of deformation tensors
    Ic = tr(C)
    J  = det(F)
    
    # Elasticity parameters
    E, nu = 10.0, 0.3
    mu, lmbda = Constant(E/(2*(1 + nu))), Constant(E*nu/((1 + nu)*(1 - 2*nu)))
    
    # Stored strain energy density (compressible neo-Hookean model)
    psi = (mu/2)*(Ic - 3) - mu*ln(J) + (lmbda/2)*(ln(J))**2
    
    # Total potential energy
    Pi = psi*dx - dot(B, u)*dx - dot(T, u)*ds(2)
    
    # Compute first variation of Pi (directional derivative about u in the direction of v)
    F = derivative(Pi, u, v)
    
    # Compute Jacobian of F
    J = derivative(F, u, du)
    
    # Solve variational problem
    problem = NonlinearVariationalProblem(F, u, bcs=bcs, J=J)
    solver = NonlinearVariationalSolver(problem)
    solver.solve()

@Ig-dolci
Copy link
Contributor

I reproduced this execution with test.sh at M2 Mac. See the results:

10x10 cold cache
real 0m7.179s
user 0m4.232s
sys 0m1.043s

10x10 hot cache
real 0m2.460s
user 0m1.421s
sys 0m0.542s

1000x1000 hot cache
real 0m38.433s
user 0m35.422s
sys 0m2.266s

10x10 cold cache
no_cache_profile

10x10 hot cache
hot_cache_profile

1000x1000 hot cache
big_hot_cache_profile

I had tsfc:WARNING only once.

@ksagiyam
Copy link
Contributor

ksagiyam commented Jul 27, 2023

My intel Mac Monterey 12.4 (Fresh install):

10x10 cold cache
0.06257073749110047

real	0m21.021s
user	0m9.369s
sys	0m4.045s
10x10 hot cache
0.06257073749110047

real	0m9.233s
user	0m4.369s
sys	0m2.088s
1000x1000 hot cache
7.078429874707133e-06

real	1m36.637s
user	1m31.152s
sys	0m3.195s

10x10 no cache:
no_cache_profile
10x10 hot cache:
hot_cache_profile
1000x1000 hot cache:
big_hot_cache_profile

Hyperelasticity example:

tsfc warnings at every step

firedrake-status:

---------------------------------------------------------------------------
|Package             |Branch                        |Revision  |Modified  |
---------------------------------------------------------------------------
|FInAT               |master                        |47f6c37   |False     |
|PyOP2               |master                        |d230953b  |False     |
|fiat                |master                        |8c66270   |False     |
|firedrake           |master                        |0ec02b2d8 |False     |
|h5py                |firedrake                     |6cc4c912  |False     |
|libspatialindex     |master                        |4768bf3   |True      |
|libsupermesh        |master                        |b145b65   |False     |
|loopy               |main                          |8158afdb  |False     |
|petsc               |firedrake                     |9364cb008b|False     |
|pyadjoint           |master                        |0378c81   |False     |
|pytest-mpi          |main                          |a478bc8   |False     |
|tsfc                |master                        |6f72c9c   |False     |
|ufl                 |master                        |3c62318c  |False     |
---------------------------------------------------------------------------

@ksagiyam
Copy link
Contributor

On my linux machine (Fresh install):

10x10 no cache:
no_cache_profile
10x10 hot cache:
hot_cache_profile
1000x1000 hot cache:
big_hot_cache_profile

Hyperelasticity example:

tsfc warnings at every step

firedrake-status:

---------------------------------------------------------------------------
|Package             |Branch                        |Revision  |Modified  |
---------------------------------------------------------------------------
|FInAT               |master                        |47f6c37   |False     |
|PyOP2               |master                        |d230953b  |False     |
|fiat                |master                        |8c66270   |False     |
|firedrake           |master                        |0ec02b2d8 |False     |
|h5py                |firedrake                     |6cc4c912  |False     |
|libspatialindex     |master                        |4768bf3   |True      |
|libsupermesh        |master                        |b145b65   |False     |
|loopy               |main                          |8158afdb  |False     |
|petsc               |firedrake                     |9364cb008b|False     |
|pyadjoint           |master                        |0378c81   |False     |
|pytest-mpi          |main                          |a478bc8   |False     |
|tsfc                |master                        |6f72c9c   |False     |
|ufl                 |master                        |3c62318c  |False     |
---------------------------------------------------------------------------

@ksagiyam
Copy link
Contributor

I see tsfc warning at each step both on my mac and on my Linux machine. It looks more like an issue of the latest Firedrake than macos vs. Linux to me.

Can everyone please put the output of firedrake-status below your test result?

@ryan-david-murphy
Copy link
Author

@ksagiyam I have updated my post with this output

@Ig-dolci
Copy link
Contributor

|Package             |Branch                        |Revision  |Modified  |
---------------------------------------------------------------------------
|COFFEE              |master                        |70c1e66   |False     |
|FInAT               |master                        |47f6c37   |False     |
|PyOP2               |master                        |d230953b  |False     |
|fiat                |master                        |8c66270   |False     |
|firedrake           |master                        |0ec02b2d8 |False     |
|h5py                |firedrake                     |6cc4c912  |False     |
|libspatialindex     |master                        |4768bf3   |True      |
|libsupermesh        |master                        |b145b65   |False     |
|loopy               |main                          |8158afdb  |False     |
|petsc               |firedrake                     |9364cb008b|False     |
|pyadjoint           |master                        |0378c81   |False     |
|pytest-mpi          |main                          |a478bc8   |False     |
|tsfc                |master                        |6f72c9c   |False     |
|ufl                 |master                        |3c62318c  |False     |
---------------------------------------------------------------------------

@ksagiyam
Copy link
Contributor

Testing on my Linux machine indicates that this PR on Constant #2927 somehow broke the caching. (Firedrake + PyOP2 + tsfc)

I used the above hyperelasticity problem as an example.

Right before #2927:

---------------------------------------------------------------------------
|Package             |Branch                        |Revision  |Modified  |
---------------------------------------------------------------------------
|COFFEE              |master                        |70c1e66   |False     |
|FInAT               |master                        |47f6c37   |False     |
|PyOP2               |HEAD                          |edae2884  |False     |
|fiat                |master                        |8c66270   |False     |
|firedrake           |HEAD                          |be82caf4e |False     |
|h5py                |firedrake                     |6cc4c912  |False     |
|libspatialindex     |master                        |4768bf3   |True      |
|libsupermesh        |master                        |b145b65   |False     |
|loopy               |main                          |8158afdb  |False     |
|petsc               |firedrake                     |9364cb008b|False     |
|pyadjoint           |master                        |0378c81   |False     |
|pytest-mpi          |main                          |a478bc8   |False     |
|tsfc                |HEAD                          |ef39f72   |False     |
|ufl                 |master                        |3c62318c  |False     |
---------------------------------------------------------------------------

Cold cache:

tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
WARNING:tsfc:Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
1
2
3

Hot cache:

0
1
2
3

Right after #2927:

---------------------------------------------------------------------------
|Package             |Branch                        |Revision  |Modified  |
---------------------------------------------------------------------------
|COFFEE              |master                        |70c1e66   |False     |
|FInAT               |master                        |47f6c37   |False     |
|PyOP2               |HEAD                          |d230953b  |False     |
|fiat                |master                        |8c66270   |False     |
|firedrake           |HEAD                          |34f930dd9 |False     |
|h5py                |firedrake                     |6cc4c912  |False     |
|libspatialindex     |master                        |4768bf3   |True      |
|libsupermesh        |master                        |b145b65   |False     |
|loopy               |main                          |8158afdb  |False     |
|petsc               |firedrake                     |9364cb008b|False     |
|pyadjoint           |master                        |0378c81   |False     |
|pytest-mpi          |main                          |a478bc8   |False     |
|tsfc                |HEAD                          |83dd8aa   |False     |
|ufl                 |master                        |3c62318c  |False     |
---------------------------------------------------------------------------

Cold cache:

0
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
1
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
2
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
3
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)

Hot cache:

0
1
2
3

@ryan-david-murphy
Copy link
Author

Hey @ksagiyam, did you work out how to fix this?

@connorjward
Copy link
Contributor

Sorry I have been on holiday for the past two weeks so haven't seen this. I think that this is a known performance problem with the recent changes to how we use Constants. Could you check whether using Firedrake branch connorjward/fix-constant-numbering and UFL branch connorjward/counted-mixin makes these go away? I already have associated PRs (Firedrake, UFL) for getting these fixes in.

@JDBetteridge JDBetteridge removed their assignment Aug 2, 2023
@ksagiyam
Copy link
Contributor

ksagiyam commented Aug 9, 2023

Yes, those branches at least fix the problem stated above.

Cold cache:

0
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
1
2
3

Hot cache:

0
1
2
3

@connorjward
Copy link
Contributor

Closing this issue as I believe it is fixed by #3011. Please reopen it if this is not the case.

@ryan-david-murphy
Copy link
Author

Thanks @connorjward, I have just updated my installation and the performance is much improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants