Skip to content

Conversation

@heplesser
Copy link
Contributor

@heplesser heplesser commented Feb 11, 2024

This fixes #3099. MPI deadlocks could occur if only some MPI ranks performed get() or set() operations on SynapseCollections. These operations are local operations so while not elegant, they should work even if executed only on some ranks and not on other, e.g., when a rank has no connections.

The hang occurred because the get()/set() methods called GetKernelStatus() to check for the number of connections to decide whether an SC was still valid. This was misguided to begin with, since the number of connections is only known locally. This is now changed to use the network size, whose global value is locally known. For a proposed proper solution, see #3100.

A further problem then was that GetKernelStatus() called ConnectionManager::update_delay_extrema_(), which unconditionally performed MPI exchange of min/max_delay. This PR changes this so that delay extrema are MPI-exchanged only if their values could have changed remotely. This also eliminates the MPI communication hidden in every single nest.<some property> use and might thus lead to faster execution of simulation scripts.

The test simply checks that it does not time out.

@heplesser heplesser added T: Bug Wrong statements in the code or documentation S: High Should be handled next I: No breaking change Previously written code will work as before, no one should note anything changing (aside the fix) labels Feb 11, 2024
Copy link
Member

@nicolossus nicolossus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heplesser LGTM. I have some minor comments/clarifications, but I'll go ahead and approve.

Co-authored-by: Nicolai Haug <39106781+nicolossus@users.noreply.github.com>
@heplesser heplesser removed the request for review from terhorstd February 14, 2024 20:19
@heplesser heplesser merged commit 8e85268 into nest:master Feb 14, 2024
@heplesser heplesser deleted the fix-3099 branch April 24, 2024 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

I: No breaking change Previously written code will work as before, no one should note anything changing (aside the fix) S: High Should be handled next T: Bug Wrong statements in the code or documentation

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

NEST hangs when setting connection weight in parallel simulations

3 participants