Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow setting ESMF_Finalize endflag in ESMPy and ESMC/I interfaces #234

Merged
merged 17 commits into from
Mar 27, 2024

Conversation

program--
Copy link
Contributor

@program-- program-- commented Mar 20, 2024

This PR adds the capability to set the endflag parameter in ESMP_Finalize to prevent MPI from being finalized at ESMF finalization.

Support issue for this PR is here: esmf-support#427

The code path followed to support this:

ESMP_Finalize
-> ESMC_Finalize
-> ESMCI_Finalize
-> f_esmf_frameworkfinalize_
-> ESMF_Finalize

Additions

  • Adds the ESMC_End_Flag enumerator to ESMC_Util.h, and similarly esmpy.api.constants.EndAction.
  • Adds ESMC_FinalizeWithFlag(ESMC_End_Flag endFlag) to preserve API compatibility with code that calls ESMC_Finalize.
  • Adds endFlag parameter to (with default of ESMC_END_NORMAL):
    • emspy.Manager constructor
    • ESMC_FinalizeWithFlag
    • ESMCI_Finalize
    • f_esmf_frameworkfinalize_ (uses type type(ESMF_End_Flag))

TODO

The only todo is to validate that unit tests pass, and that this parameter works as intended.

@oehmke oehmke requested a review from billsacks March 20, 2024 20:47
@billsacks billsacks requested a review from oehmke March 20, 2024 22:32
Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for working on this and submitting a PR! I appreciate the time you took to dig down through the layers and make the necessary changes at each layer. I also appreciate your attention to detail in this PR, such as adding documentation in relevant places.

I have two inline comments with things that look like they should be changed a bit. My focus was on the Python pieces, with only a cursory look at the other layers. I'll ask @oehmke to look at the other layers since he's more familiar with that.

src/Superstructure/ESMFMod/interface/ESMF_Init_C.F90 Outdated Show resolved Hide resolved
src/addon/esmpy/src/esmpy/api/esmpymanager.py Outdated Show resolved Hide resolved
@billsacks
Copy link
Member

The only todo is to validate that unit tests pass, and that this parameter works as intended.

Please let us know when you have validated that this parameter works as intended - especially that it indeed solves your issue. Thanks again!

use ESMF_CompMod
use ESMF_InitMod
use ESMF_UtilTypesMod
use, intrinsic :: iso_c_binding
Copy link
Contributor Author

@program-- program-- Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized a mistake I made -- this implementation is F90, so I don't believe that I can use iso_c_binding here (if I remember correctly, it's only F03 and up). May need to pass in an integer instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to use iso_c_binding. Despite the file extension being F90, we support this and other F03 features. (I see we use iso_c_binding throughout ESMF, though I'll let @oehmke comment on whether there's a different typical/preferred approach.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're slowly moving to iso_c_binding as things are changed or new code is added, so I agree that using it is a good approach.

@program--
Copy link
Contributor Author

program-- commented Mar 21, 2024

@billsacks @oehmke I've refactored this design slightly to instead propagate ESMC_End_Flag at the Python/C/C++ layers (as suggested by @PhilMiller), instead of a boolean, so this should preserve the details found at the Fortran layer. Still haven't gotten a chance to fully test this out, but ESMF compiles correctly on my end with these changes, at least.

Sorry, silly change, but we usually put the rc=rc at the end... :-)
This method can only be called once per execution,
and must be preceded by one and only one call to
ESMP_Initialize(). \n
released, and, if 'keepMpi' is False, all MPI states
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be changed to say something about endFlag equaling Normal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was in reference to an older commit, but I revised the docstring again to be explicit about MPI cleanup only when endFlag is set to NORMAL (and not necessarily when it's set to something other than KEEP_MPI). Fixed in 0f7b9b4.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks!

Copy link
Contributor

@oehmke oehmke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Justin,

First, I just want to echo what Bill said, thanks for all of your careful work on this! I think that overall this looks great except for the few small changes that I mentioned. We are still talking a little about the one interface, but I think that other than that it looks good to
me. Let us know how it goes with your testing.

Thanks,

  • Bob

@program--
Copy link
Contributor Author

program-- commented Mar 21, 2024

@oehmke Awesome, glad to hear! I'll finish up those small changes today. I've also tested with a small toy program so far that ensures the C API works as expected (not exactly a unit test, but close?)

C Example Program
#include <stdio.h>
#include <assert.h>
#include <string.h>

int state_ = 0;

extern int PMPI_Finalize();

// intercept MPI calls to output
int MPI_Finalize() {
    printf("\nMPI_Finalize called\n");
    PMPI_Finalize();
    state_ = 1;
    return 0;
}

int MPI_Abort(int comm, int errorcode) {
    printf("\nMPI_Abort called\n");
    state_ = 10;
    return 0;
}

#include <mpi.h>
#include "ESMC.h"

int main(int argc, const char* argv[]) {

    if (argc < 2) {
        printf("Expected one of: keep_mpi, abort, normal\n");
        return 0;
    }

    MPI_Init(NULL, NULL);

    ESMC_End_Flag end = ESMC_END_NORMAL;

    if (strcmp(argv[1], "keep_mpi") == 0) { 
        end = ESMC_END_KEEPMPI;
    } else if (strcmp(argv[1], "abort") == 0) {
        end = ESMC_END_ABORT;
    } // doesn't actually check for normal

    int rc;
    ESMC_Initialize(&rc, ESMC_ArgLast);
    ESMC_FinalizeWithFlag(end);

    switch(end) {
        case ESMC_END_KEEPMPI:
            printf("Called with ESMC_END_KEEPMPI\n");
            assert(state_ == 0);
            MPI_Finalize();
            break;
        case ESMC_END_NORMAL:
            printf("Called with ESMC_END_NORMAL\n");
            assert(state_ == 1);
            break;
        case ESMC_END_ABORT:
            printf("Called with ESMC_END_ABORT\n");
            assert(state_ == 10);
            break;
    }

    return 0;
}

In this case, I built with clang, gfortran, and (intel) mpich.

Compiling this to test_esmf_endflag provides the following outputs:

# normal
./test_esmf_endflag normal
#> MPI_Finalize called
#> Called with ESMC_END_NORMAL

# keep_mpi
./test_esmf_endflag keep_mpi
#> Called with ESMC_END_KEEPMPI
#> MPI_Finalized called

# abort
./test_esmf_endflag abort
#> MPI_Abort called
#> Called with ESMC_END_ABORT

I will verify with Python as well, but I expect I'll see the same results 🙂 Thanks again for the review and suggestions!

EDIT: still trying to get Python to work, having some issues on my end that I think are related to mpi4py or how I'm compiling... going to set this PR as ready for review in the meantime

to be explicit about MPI cleanup only when endFlag is set
to NORMAL, not when its not set to KEEP_MPI.
@program-- program-- marked this pull request as ready for review March 21, 2024 21:46
Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great. I noticed one change where I wonder if the difference is intentional or an accident of some back-and-forth changes. Other than that, this looks good to me, but I'll also wait for a final review from @oehmke , since he's much more familiar with what's needed at the various layers than I am.

src/Superstructure/ESMFMod/include/ESMCI_Init.h Outdated Show resolved Hide resolved
Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me now - thank you! @program-- I wanted to also thank you for showing the test program you used for this. And @PhilMiller thank you for your contributions to this review and discussion as well!

I ran the full ESMF and ESMPy testing on my Mac, and this looks good.

I'll wait on final approval from @oehmke before merging.

@PhilMiller
Copy link

Thank you both for working with us on this. It will help us a lot in integrating ESMF functionality in the NWM NextGen system.

Can I ask when you think this will make it into a release, once it's merged?

@oehmke
Copy link
Contributor

oehmke commented Mar 25, 2024 via email

@PhilMiller
Copy link

Ok, that sounds good, thank you. I think the beta tag would potentially be useful.

Copy link
Contributor

@oehmke oehmke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a final look at this and everything looks good to me. I'll probably merge this tomorrow(3/26) unless anyone has any final objections. Thanks again!

@jduckerOWP
Copy link

@oehmke can you confirm to the community here the earliest version of ESMF that this PR has been merged to? Thank you all very much for your efforts on this front, it's been a great help to the NOAA-OWP community.

@oehmke
Copy link
Contributor

oehmke commented Jun 12, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants