Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel run with one proc w/o partition file #721

Merged
merged 20 commits into from
Mar 19, 2024

Conversation

stcui007
Copy link
Contributor

@stcui007 stcui007 commented Feb 2, 2024

This PR address issue #703 in ngen framework, i.e., to run a MPI job using one processor without providing a partition file. For more than one processor, you must provide a partition file. If you insist on providing the partition file in the case of one processor, you are still ok.

Additions

  • Define a dummy class PartitionOne for to present a whole-hydrofabric partition for a single process, and associated test

Removals

None

Changes

  • Use PartitionOne when running on a single MPI process, rather than expecting a partition file argument on the command line

Testing

Passed all local test except RoutingPyBindTest.TestRoutingPyBind
Passing all automated testing

Screenshots

Notes

Command line example to run a job without/with partition file:

mpirun -n 1 ./cmake_build_mpi/ngen data/catchment_data.geojson '' data/nexus_data.geojson '' data/example_bmi_multi_realization_config.json

mpirun -n 3 ./cmake_build_mpi/ngen data/catchment_data.geojson '' data/nexus_data.geojson '' data/example_bmi_multi_realization_config.json ./test_partition_3.json

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

@stcui007 stcui007 marked this pull request as ready for review February 7, 2024 14:46
@program-- program-- linked an issue Feb 10, 2024 that may be closed by this pull request
@stcui007
Copy link
Contributor Author

stcui007 commented Feb 23, 2024 via email

NGen::realizations_catchment
NGen::mdarray
NGen::mdframe
NGen::logging
NGen::ngen_bmi
testbmicppmodel
REQUIRES
NGEN_WITH_SQLITE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about making the entire test_unit build dependent on having SQLite support enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the test case could say

         #if !defined(NGEN_WITH_SQLITE3)
            GTEST_SKIP() << "Compiled without SQLite support"
         #endif

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is with the geopackage is involved in the test code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't reviewed this, but IIRC from an earlier discussion, you could take out the GeoPackage reading, and just use GeoJSON, since it isn't needed to test the functionality of this PR.

Comment on lines 123 to 129
if( duplicates.size() > 0 ){
for( auto& id: duplicates){
}
exit(1);
}

ASSERT_TRUE(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if( duplicates.size() > 0 ){
for( auto& id: duplicates){
}
exit(1);
}
ASSERT_TRUE(true);
for( auto& id: duplicates){
// Add printing code as needed
}
ASSERT_EQ(duplicates.size(), 0);

Unit tests should never be calling exit() directly, since that kills the whole test process, and doesn't properly report the failed test result.

The loop over duplicates doesn't need to be conditional, since if there are none, it does nothing.

Same applies to 1b below.

@PhilMiller
Copy link
Contributor

I pushed fixes for some of the smaller things I noted in my review. I wanted to call my concerns about the test code to your attention.

@stcui007
Copy link
Contributor Author

stcui007 commented Feb 26, 2024 via email

@stcui007
Copy link
Contributor Author

Ignore my previous question if you don't have any more to add. Just read your comments on the test codes.

@stcui007
Copy link
Contributor Author

stcui007 commented Feb 26, 2024 via email

@stcui007
Copy link
Contributor Author

stcui007 commented Feb 26, 2024 via email

#ifndef PARTITION_ONE_H
#define PARTITION_ONE_H

#ifdef NGEN_MPI_ACTIVE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guard for NGEN_MPI_ACTIVE isn't necessary here. MPI's not actually referenced or used by any of this code. Removing the guard would simplify test code by removing the weird #define NGEN_MPI_ACTIVE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The presence of NGEN_MPI_ACTIVE does make the code look a bit cumbersome. However, Partition_One is specifically designed for this case. The source codes are under this condition. On the other hand, I assume if the test code works w/o MPI. it is probably ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NGEN_MPI_ACTIVE is needed as Partition_One.hpp needs it. Without it, the test code won't compile. The Parttion_Test.cpp using Partition_Parser.hpp works in a similar fashion. So this is kept in place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing in this class Partition_One actually uses MPI, so we don't need to guard the class definition for MPI availability.

If we don't guard the class, then we don't need to do an inappropriate #define in the test.

@stcui007
Copy link
Contributor Author

stcui007 commented Mar 11, 2024 via email

@PhilMiller
Copy link
Contributor

Code looks good. I'm going to wait until we've got #767 merged to rebase and merge this.

@PhilMiller PhilMiller merged commit f91e2ea into NOAA-OWP:master Mar 19, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simplify parallel run on single processor
3 participants