Benchmark MOi curvilinear #7

erikvansebille · 2025-08-25T14:17:50Z

This PR benchmarks a "real-life" simulation on the curvilinear 1/12degree global NEMO data as distributed by MOi. There's an option for both a surface run (where only the surface fields are used; so less I/O) or a full 3D run. Benchmarks are discussed in the comments below

erikvansebille · 2025-08-25T14:34:02Z

Now that @fluidnumerics-joe's Parcels-code/Parcels#2158 has significantly sped up the curvilinear index search, it's time to focus on the dask data loading again. Below is an analysis of the surface flow experiments

It turns out that calling the dask.compute() in the interpolator is extremely slow, see performance graph below (all these were run on our Lorenz cluster)

As always, black is v3-JIT and grey is v3-Scipy. Red is the original v4-dev with the Morton encoding; it is really very slow...

We can considerably speed up v4-dev by calling a ds.load on the entire Field. If we do that within the benchmarking script, we get the dashed yellow line; which gives a ~30s loading time the first time, and is blazingly fast afterwards. But of course this is a bit cheating, because normally we wouldn't be able to load all data into memory. So I also implemented a field._load_timesteps() function again in Parcels-code/Parcels#2161, which is the green curve. That is fairly fast, roughly a factor 5 slower than JIT.

So on the one hand this is great news (we have a version of Parcels v4 that is within an order of magnitude as the JIT performance!), but it comes at a cost of having to load in the full fields. It must be possible to more smartly use dask to load in only those parts of the fields needed for the interpolation; the question is: how....

erikvansebille · 2025-08-29T06:43:39Z

Here's a new version of the benchmark analysis for the 2D surface MOi flow fields

In this plot, there's an extra line for a simulation where I used chunking parameter on xr.open_mfdataset(..., chunks= {"time_counter": 1, "depth":2, "y": chunk, "x": chunk}). The cyan line shows the performance for chunk=128. As can be seen, it's slower than load_timesteps() (the green line), but much faster than v4-dev without chunking (the red line, especially for small numbers of particles)

I did a further analysis of the runtime as a function of chunk, keeping the number of particles to 10k

From this analysis, it's clear that in this setup, v4-dev is very slow for very small and very large chunk values, and that the optimum is somewhere around chunk=128.

Finally, I also did a simulation in the full 3-dimensional MOi data:

What's incredible here is that v4-dev with load_timesteps() is actually faster than v3-JIT, up to 100k particles! Chunking with chunk=128 has now roughly the same performance as JIT-v3

So the good news is that we can get v4-dev considerably faster by choosing an appropriate chunk size when opening the xarray dataset; but that puts a lot of responsibility for performance on the users - outside the control of Parcels itself. Perhaps we can - in the coming months - find a way to optimise the chunking automatically - under the hood?

The alternative is still to use load_timesteps() (#2161), but that has disadvantages of its own (forcing to use linear time interpolation; big FieldSet memory footprint for particles in only a small part of the domain, ...)

To be continued...

This fixes Parcels-code/Parcels#2254

…arcels/parcels-benchmarks into benchmark-MOi_curvilinear

erikvansebille added 5 commits July 23, 2025 16:21

Create benchmark_moi_curvilinear.py

99cf6d0

Fixing moi-curvilinear benchmark to run

e9a9137

Updating moi curvilinear benchmark to also assess peak memory

b013cef

improving output readibility

a7c2e21

Adding support for choosing surface simulation only

2f86029

erikvansebille mentioned this pull request Aug 25, 2025

Implement fieldset._load_timesteps() attempt 2 Parcels-code/Parcels#2161

Closed

1 task

Adding support for chunking cycling

aba0f88

erikvansebille added 7 commits August 29, 2025 08:48

small fixes to benchmark

4d778f1

update benchmark to use xgcm package from conda

4067f4c

Adding preloading to CLI

ae1c543

Moving xgcm import to v4 testing logic

61cc582

Making sure mesh=spherical

2a3116b

This fixes Parcels-code/Parcels#2254

Merge branch 'benchmark-MOi_curvilinear' of https://github.com/OceanP…

6c8c9fa

…arcels/parcels-benchmarks into benchmark-MOi_curvilinear

Updating script to new v4 structure

89dc3dc

erikvansebille marked this pull request as ready for review October 1, 2025 11:57

erikvansebille merged commit 9e6b09e into main Oct 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark MOi curvilinear #7

Benchmark MOi curvilinear #7

Uh oh!

erikvansebille commented Aug 25, 2025

Uh oh!

erikvansebille commented Aug 25, 2025 •

edited

Loading

Uh oh!

erikvansebille commented Aug 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Benchmark MOi curvilinear #7

Benchmark MOi curvilinear #7

Uh oh!

Conversation

erikvansebille commented Aug 25, 2025

Uh oh!

erikvansebille commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erikvansebille commented Aug 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

erikvansebille commented Aug 25, 2025 •

edited

Loading