Skip to content

Conversation

@Manangka
Copy link
Collaborator

@Manangka Manangka commented Oct 13, 2025

Fixes #1696

Description

This PR implements some of the key insights @Huite found while optimzing the splitting of Models. Although there are some differences in his implementation and this PR. The optimizatiob exists of 2 parts

The First part has to do with the way models and packages are split. As @Huite pointed out the current implementation leads to a package being loaded and unloaded several times, which leads to a huge performance penalty because the package needs to be loaded from disk each time.
This has been solved by reverting the order of looping. In the original implementation there was a loop over the models, foreach partition a deepcopy was made of the original model. Then afterwards we loop over each model and access each package in the model to determine if it belongs to the same domain as the model. This is expensive.
This PR reverts the loop. Frist a shallow copy is made of all the models. Then we loop over each package of the original package and we determine if belongs to once of the submodels. This way we only need the open a package once.

The second part is the way the SSM and BUY packages are updated after the splitting has happend. There was a pience of code that looped over all the flow and transport models to determine which models share the same domain (are in the same partition). This is an expensive operation because the grid data has to be loaded into memory over and over.
In this PR this has been solved by bringing the update of the SSM and BUY package to the place where that information is already know i.e. the model splitter. This way we can avoid the expensive loading of the grid data.
Also by moving it to the modelsplitter we can avoid forcing the load of the grid data manually

With these changes the splitting of the mode @Huite provided me takes now around 5m:30s

Checklist

  • Links to correct issue
  • Update changelog, if changes affect users
  • PR title starts with Issue #nr, e.g. Issue #737
  • Unit tests were added
  • If feature added: Added/extended example
  • If feature added: Added feature to API documentation
  • If pixi.lock was changed: Ran pixi run generate-sbom and committed changes

@Huite
Copy link
Contributor

Huite commented Oct 14, 2025

I'm probably a bit too early FWIW, but I'll comment anyway; I think I recognize the predicate variable approach I suggested in a comment.

The activity count function in my draft PR has this:

    # Use ellipsis to reduce over ALL dimensions except label dims
    dims_to_aggregate = [dim for dim in sample.dims if dim not in label_dims]
    counts = sample.notnull().sum(dim=dims_to_aggregate).groupby(labels).sum()
    return {label: int(n) for label, n in enumerate(counts.data)}

This is more robust in terms of IO-overhead, since if the data is lazy, we require a single load of the data to do the groupby. Whereas doing a clip / isel kind of thing creates a necessity to load for each partition created.

Of course, the IO per partition is inescapable if you start writing the models, hence my spatial chunking (after reordering via space filling curve) prior to split & write.

Even so, I think the groupby on the unpartitioned data is worthwhile since it will lead to less IO regardless of whether the data is spatially chunked.

@Manangka
Copy link
Collaborator Author

Hi @Huite, I tackled the problem in a slightly different way that you do. Or at least I think I'm doing the same but in a different order. As you correctly asset one of the biggest problem with the model splitter is that it is loading and unloading the same package data for each partition.

This is happening because due the way the current modelsplitter is designed. It makes a deepcopy of the original model, including all the packages, and then goes over each partitioned model and assesses if the packages in that partitioned model actually belongs to the domein of that model. (You of course already know this but just stating it for future references)

If i read your code correctly you solved this by creating a boundary polygon for each partition first. Then when looping over the models and assessing if a package belongs to that model you use the polygon test. This works great because you;ve checked before if a package belongs to a partition or not.

I Solved the issue in a different way. Basically your approach and the original code do a double loop with the model as the outer loop and the packages as the inner loop. I turn this around. I first create all the partioned models but without packages. Then I start looping over the packages (outer loop). The package is loaded and stays in memory. Then i loop over the models and check if the package belongs to the model. The latter is done by comparing the a known domain value of the package to the active_domain of the model partition. Doing it this way I ensure that the package data is only loaded once.

With this approach the split takes between 4m30 and 5m30 for the big model that you shared.
image

What do you think of this approach?
If you have time can you run it on your machine and test if you also see the same gain in performance

@Huite
Copy link
Contributor

Huite commented Oct 15, 2025

Alright! The most important part to me is:

The latter is done by comparing the a known domain value of the package to the active_domain of the model partition.

This sounds like a good approach!

Doing it this way I ensure that the package data is only loaded once.

I'm not entirely sure this is the case, though?

The key lines are:

                active_package_domain = package[
                    self._pkg_id_to_var_mapping[pkg_id]
                ].notnull()

Unless we force a .compute() here, I think the active_package_domain could be a dask backed xarray DataArray. If that's the case, the following part is where the compute is forced:

               has_overlap = (
                    active_package_domain & partition_info.active_domain.astype(bool)
                ).any()  # type: ignore

Since it's the any() reduction which definitely forces a computation.

I think there are two minor but possibly impactful improvements: Reduce the active_package_domain to spatial dimensions only immediately in _get_package_domain, i.e. get rid of time. This single spatial grid should easily fit into memory, so we can call a .compute() without worry and ensure we incur IO only once.

@Manangka Manangka changed the title Optimize modelsplitter Issue #1696 Optimize modelsplitter Oct 17, 2025
@Manangka Manangka marked this pull request as ready for review October 17, 2025 15:37
This was referenced Oct 17, 2025
@FransRoelofsen FransRoelofsen moved this to 🏗 In Progress in iMOD Suite Oct 20, 2025
@JoerivanEngelen JoerivanEngelen moved this from 🏗 In Progress to 🧐 In Review in iMOD Suite Oct 27, 2025
Copy link
Contributor

@JoerivanEngelen JoerivanEngelen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I have a few comments. Thanks for going through the effort of creating this clean solution and adding a separate ModelSplitter class for it. Depending on capacity we'll see whether you or I will pick up my comments.

pixi.toml Outdated
pytest-cases = ">=3.9.1"
pytest-cov = "*"
pytest-dotenv = "*"
pytest-profiling = "*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see this being used?

Comment on lines 164 to 166
if isinstance(package, IAgnosticPackage):
if sliced_package["index"].size == 0:
sliced_package = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be simplified to:

Suggested change
if isinstance(package, IAgnosticPackage):
if sliced_package["index"].size == 0:
sliced_package = None
if isinstance(package, IAgnosticPackage) and sliced_package.is_empty:
sliced_package = None

In this way, you are not referring to the "index" variable, which might be or not be present. I think the check for IAgnosticPackage prevents calling the AllNoDataScheme on large, chunked grids, which .is_empty does.


return partitioned_models

def update_packages(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this method name a bit generic, how about:

Suggested change
def update_packages(self) -> None:
def update_packages_after_split(self) -> None:

Perhaps CoPilot has a briefer suggestion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I resorted to "update_dependent_packages" in the end

continue

# Slice and add the package to the partitioned model
sliced_package = clip_by_grid(package, partition_info.active_domain)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I miss the remove_expanded_auxiliary_variables_from_dataset(package) preceding this, that was there before. This is used for multi-species simulations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget what I said, I see a split with multispecies is tested in test_split_flow_and_transport_model, so this aux expansion isn't necessary here.

@JoerivanEngelen
Copy link
Contributor

I think there are two minor but possibly impactful improvements: Reduce the active_package_domain to spatial dimensions only immediately in _get_package_domain, i.e. get rid of time. This single spatial grid should easily fit into memory, so we can call a .compute() without worry and ensure we incur IO only once.

FYI: We can call the trim_time_dimension function to get rid of the time dimension.

Copy link
Contributor

@JoerivanEngelen JoerivanEngelen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment

Comment on lines 262 to 263
if "layer" in ds.dims:
dims_to_be_removed.append("layer")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think layer can be dropped here: What if there is no data present in layer 1, but there is in layer 2? What does xarray do when you then drop_vars on layer? I think we either have to aggregate by summing all notnull across the layer dim, or just leave the layer dim in here and let xarray broadcast.

Comment on lines 293 to 301
if isinstance(package, IAgnosticPackage):
pass
# No checks are done for these packages
elif pkg_id in self._pkg_id_skip_active_domain_check:
pass
else:
has_overlap = (
active_package_domain & partition_info.active_domain.astype(bool)
).any() # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding "pass" is a code smell. I refactored this to a separate method _is_package_to_skip, as the same logic is also used in _get_package_domain.


return partitioned_models

def update_packages(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I resorted to "update_dependent_packages" in the end

@JoerivanEngelen JoerivanEngelen merged commit 41263ba into master Oct 29, 2025
@JoerivanEngelen JoerivanEngelen deleted the model_splitter_optimization branch October 29, 2025 11:08
@github-project-automation github-project-automation bot moved this from 🧐 In Review to ✅ Done in iMOD Suite Oct 29, 2025
@sonarqubecloud
Copy link

@JoerivanEngelen
Copy link
Contributor

I've added functionality to ignore time, and it reduced the splitting operation to 1.5 minute.

@JoerivanEngelen JoerivanEngelen restored the model_splitter_optimization branch October 29, 2025 14:47
@JoerivanEngelen JoerivanEngelen deleted the model_splitter_optimization branch October 29, 2025 14:58
JoerivanEngelen added a commit that referenced this pull request Oct 30, 2025
Fixes #1698


# Description
This is part 2 of fixing the performance issues with large model.
In part 1 #1693 the modelsplitter has been optimized. In this PR the
focus is on wiring the partitioned model.

As @Huite pointed out in #1686 the performance bottleneck had to do with
the fact that the same package had to be loaded from file multiple times
while only a part of the file is actually needed.

After digging around for a while i discovered that this had to do with
the fact how we open de the dataset.
`dataset = xr.open_dataset(path, **kwargs)`

In the line above we don't specify anything chunk related. That has as a
result that when you access the dataset the entire file has to be loaded
from disk. By simply adding `chunks="auto"` this is no longer the case
and a huge performance gain is achieved.

There are some other changes related to setting chunking to auto. There
are some parts of the code that don't expect to receive dask arrays. For
instance you can use .item() on a dask array. Instead i now use
.values[()].

I was also getting some errors when the to_netcdf method were called on
the package. All of them had something to do with wrong/unsupported
datatypes. In this PR you will find that an encoding is added for
float16 types. And that in some packages the from_file method has been
updated to ensure that he loaded type is converted to a supported type

An unrelated change but performance wise significant change has been
applied to the `_get_transport_models_per_flow_model` method. This
method is used to match gwf models to gwt models so that gwfgwt
exchanges can be created. This method was doing a full comparison of
domains, which is expensive. There is also a method available that does
the comparison on domain level. By switching to this method the matching
algorithm becomes almost instantaneously.

**NOTE**
This PR has issue #1699 as a base. The base needs to altered to master
once that PR is in

**NOTE**
This PR also improves the `dump` method

**NOTE**
some timmings:

<img width="833" height="739" alt="image"
src="https://github.com/user-attachments/assets/974c841c-0413-4433-8486-1abe98dc0715"
/>

<img width="843" height="215" alt="image"
src="https://github.com/user-attachments/assets/c7082975-af35-4143-a6f9-860557b3eb09"
/>

<img width="842" height="705" alt="image"
src="https://github.com/user-attachments/assets/383bf1a6-f028-4cb4-aa72-48ab95e84e3d"
/>



<!---
Before requesting review, please go through this checklist:
-->

- [x] Links to correct issue
- [ ] Update changelog, if changes affect users
- [x] PR title starts with ``Issue #nr``, e.g. ``Issue #737``
- [ ] Unit tests were added
- [ ] **If feature added**: Added/extended example
- [ ] **If feature added**: Added feature to API documentation
- [ ] **If pixi.lock was changed**: Ran `pixi run generate-sbom` and
committed changes

---------

Co-authored-by: JoerivanEngelen <joerivanengelen@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

Improve performance Model split

4 participants