Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI crashing out with exit code 137 #2824

Open
JDBetteridge opened this issue Mar 16, 2023 · 9 comments
Open

CI crashing out with exit code 137 #2824

JDBetteridge opened this issue Mar 16, 2023 · 9 comments

Comments

@JDBetteridge
Copy link
Member

We are running out of memory on the CI. I have now profiled the test suite and found the source of the issue to be the test test_firedrake_helmholtz_scalar_convergence_on_hex in tests/regression/test_helmholtz.py introduced in commit e68b9f63644fce64fd23c85031261e29aae2dd88.

This issue should be treated as fairly urgent as currently it is not possible to run tests locally unless you have a pretty beefy machine (at least 64GB RAM!) and even then you are at danger of running out of memory if using xdist to parallelise pytest.

I attach two plots showing the memory profile for the test suite (run without xdist).

full_test_suite
test_helmholtz

@ksagiyam
Copy link
Contributor

This test is large as it is a convergence test on hex mesh, and the hex mesh must be such that it contains all possible facet orientations. I might have to create a reasonable mesh by hand.

@JDBetteridge
Copy link
Member Author

We should also have some policy on acceptable test sizes. Off the top of my head a good starting point would be:

  • Test duration <1minute
  • MPI ranks <=4
  • Total memory <4GB

Runner hardware is currently 48 physical cores, 64GB RAM. Four Github runners share this hardware, tests are run using pytest xdist with -n 12 (currently -n 8 to try and mitigate this issue).

With possible exceptions being number of ranks could be greater for testing communicator functionality, or for testing Ensemble. In these cases we should only break one of these limits.

We should also concretise these limits on the wiki.

@JDBetteridge
Copy link
Member Author

@ksagiyam I think constructing the mesh by hand to reduce the size would be a good idea. Could you split the problem up and have a set of meshes which together cover all possible orientations, rather than one big mesh containing all orientations?

@ksagiyam
Copy link
Contributor

That could be a good option, actually.

@dham
Copy link
Member

dham commented Mar 16, 2023

Does this need to be a convergence test at all? I presume this is basically an orientations test. If you did the test based on data in a polynomial space of degree no higher than the elements then the operations should be exact up to machine precision and you could instead check for near zero error.

@wence-
Copy link
Contributor

wence- commented Mar 16, 2023

Or compute some cohomology which is topological, but I presume would be sensitive to orientations being incorrect.

@ksagiyam
Copy link
Contributor

Ok, sounds good. For now let me just quickly do polynomial interpolation tests to fix CI.

@connorjward
Copy link
Contributor

@JDBetteridge can this be closed?

@JDBetteridge
Copy link
Member Author

Can we leave it open until we add something to the wiki?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants