Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move CLMBuildNamelist.pm perl code into the python buildnml #585

Open
billsacks opened this issue Dec 3, 2018 · 23 comments
Open

Move CLMBuildNamelist.pm perl code into the python buildnml #585

billsacks opened this issue Dec 3, 2018 · 23 comments
Assignees
Labels
blocker another issue/PR depends on this one code health improving internal code structure to make easier to maintain (sustainability) size: large Large project that will take a few weeks or more

Comments

@billsacks
Copy link
Member

We want to rewrite CLMBuildNamelist.pm in python. This is partly necessary in order to support our long-term goal of having parameters read from a netcdf file rather than a fortran namelist. But it's also important in order to take advantage of new features in the cime python namelist generation utilities.

@billsacks billsacks added the code health improving internal code structure to make easier to maintain (sustainability) label Dec 3, 2018
@billsacks
Copy link
Member Author

I'm a fan of doing as much as possible incrementally, rather than via a giant big bang rewrite. To this end: I wonder if it would be possible to incrementally migrate the perl to python. I'm imagining:

  • The top-level build-namelist script would call the perl CLMBuildNamelist.pm and then the python CLMBuildNamelist.py. Each would create its own subset of the namelist. Then the top-level script would concatenate the two together. (For this to work easily, I think we'd need to migrate entire namelist groups all at once.)

  • As we incrementally move things from the perl to the python, we would also move the relevant variables out of the old namelist_definition / namelist_defaults files into a new file (which would follow the new convention of having the definition and defaults together in a single file). In the end, we could delete the now-empty old files, leaving just the new file.

In addition to letting us make the changes and test them incrementally, this strategy would also make it possible for new namelist groups to be added directly in the python even before the entire conversion is done, thus making for less overall work involved with the conversion.

@billsacks billsacks self-assigned this Dec 4, 2018
@billsacks
Copy link
Member Author

@ekluzek this might be a good thing for us to pair program, at least for portions of it. So I'm also assigning myself.

@ekluzek
Copy link
Collaborator

ekluzek commented Dec 4, 2018

This approach does make sense to me as well, as a way to make the conversion incremental, which is probably the only way we can see it happen. We originally talked about doing this months ago, and we haven't really been able to look at it (at least I haven't).

@billsacks
Copy link
Member Author

Thinking more about the incremental strategy I laid out: One potential problem is this: It would be really good if we can leverage the NLCOMPs in the test suite to determine if anything is unexpected (while most namelist differences would cause baseline comparison failures, some, like differences in hist field lists, would not). One possible way around this would be: after concatenating the perl and python namelists, sort the namelist groups alphabetically. That way, the resulting merged list will be independent of which pieces are done in the perl vs. the python. (After we are done the conversion, we could get rid of this sort, so that we go back to the old order.)

@billsacks billsacks removed their assignment Jan 9, 2019
@billsacks
Copy link
Member Author

I was thinking a bit about the build-namelist unit tester (build-namelist_test.pl), and how we can make this easier to work with when we move to python. I think that part of what makes the current script hard to work with is that it's mostly one long function with a bunch of different tests. This makes it hard to tell which parts of the script apply to all tests, which apply to some tests, and which just apply to a single test.

I find it easier to work with tests when each test is self-contained (e.g., in its own function, as is the recommended practice for unit tests). There can (and should) still be shared functions that are called by multiple individual tests, but you should be able to read through a single (short) test function and see everything that's done to set up that test. ("Oh, this test uses the standard arguments and then adds the one argument --foo.")

@billsacks
Copy link
Member Author

I was thinking a bit about how to proceed here, given that (1) @ekluzek 's plate keeps getting filled up with short-term needs, which doesn't let him get to long-term projects like this, and (2) we're waiting to extract many parameters until this pythonization is done (and in the meantime, the parameter netcdf file is in a non-ideal state in terms of both usability and maintainability).

A strategy I thought of, tied in with the idea of doing this incrementally (having both a perl and python namelist side-by-side for a while is): What if we start by tackling the parameters that are currently in the netcdf file? So the first step would be to make the netcdf input file generated on the fly, allowing users to set values there via the user_nl mechanism. To do this, we would need to do the following:

  • Add python infrastructure in cime that allows generating a netcdf file rather than a Fortran namelist

  • Introduce a new xml file that gives default values of these parameters, in the new style (where the definition and defaults are combined in a single file)

  • Add some python code in ctsm that leverages this new cime infrastructure, generating a netcdf file with all of the parameters given in the new xml file, plus a separate namelist file that for now just contains the file name of the netcdf parameter file (eventually, once we have an all-python build-namelist, this will be one entry in the single namelist file). (Hopefully the ctsm-specific python code will be minimal for these numeric parameters: most or all required information should be in the xml file. Once we start extracting flags for which we need consistency checks, special handling, etc., we'll need more logic in the python, but for these numeric parameters, I don't imagine we'll need much special handling.)

  • Ensure that both the perl and python build-namelists are called. If possible, the python should be called as a subroutine, not as a subprocess. So cime's preview_namelists call would call ctsm's python build-namelist, which would (a) do the python-based namelist / netcdf generation, and (b) invoke the legacy perl-based namelist generation as a subprocess.

  • Change the Fortran code a bit to read the parameter file name out of this new (temporary) namelist file. (Other than that, few if any changes should be needed in the Fortran code.)

Once the existing parameter file has been migrated over in this way, we can start extracting new parameters using the new python-based infrastructure.

@billsacks
Copy link
Member Author

From today's CSEG meeting:

Alper's presentation: Staging MOM6 runtime parameter input files for CESM cases

One interesting point from this is that he uses yaml rather than xml to store namelist defaults (currently, the developer runs a script to convert this yaml to json; this won't be needed once cime allows yaml). A huge benefit of what Alper has done is that he supports arbitrary python expressions as conditionals (which he then evaluates).

We should have some more discussions about whether we'd like to migrate to something like this vs. keeping with the current xml-based namelist generation. Mariana says that she and Jim aren't very happy with the current implementation (namelist.py and nmlgen.py), so she could see migrating to something new / better.

The above benefit could be very helpful for us in CTSM's namelist generation, too. We plan to have some discussion in CSEG about which way we want to go long-term - yaml or xml - and then we should make CTSM's python-based namelist generation consistent with that decision.

If we do stick with xml, I wonder if we can come up with a way to accomplish the same arbitrary-python-expression functionality in xml. I'm thinking we could have an attribute with a special name like "python_condition"; when this is encountered, it is evaluated, and we determine if it evaluates to true. From looking briefly at cime's entry_id.py:_get_value_match function, I think we could accomplish this via modifying or writing a completely different version of that function (though I'm not sure off-hand how to feed in information on all of the appropriate variables that should provide context for the evaluation).

@billsacks
Copy link
Member Author

Blocks #454

@billsacks billsacks added the size: large Large project that will take a few weeks or more label Mar 9, 2020
@billsacks
Copy link
Member Author

One thing I'd like us to be careful about: We have a number of namelist variables whose defaults depend on the values of other namelist variables - e.g., fire parameters whose defaults depend on the fire method being used. It would be easy to accidentally have these parsed in the wrong order; we should have some error checking in place to ensure that doesn't happen.

Alternatively, we could have a rule that namelist defaults should only depend on xml variables, not on other namelist variables. This would require changing some namelist variables into xml variables. I used to argue that we should do this for conceptual clarity (for both users and developers), but I'm starting to feel like (1) that may involve too significant of a change, and (2) it could be harder to maintain with lilac, since lilac doesn't use the same xml variable mechanism.

A compromise could be to move these namelist variables to a special namelist group, like "high_level_settings". Those would be parsed first, and we could have a rule, somehow enforced by the code, that other variables' defaults can only depend on namelist variables in that group.

@wwieder
Copy link
Contributor

wwieder commented Sep 30, 2020 via email

@billsacks
Copy link
Member Author

Is this also worth discussing in the context of the PPE? Maybe in the short term we don't need to worry about this, but the dual nature of namelist vs. parameter file parameters is confusing for users (including me).

One of the important reasons for rewriting build-namelist in python is to support our plan for generating the netcdf params file on the fly, and moving all numeric parameters to the params file. Under this plan, all numeric parameters will be modifiable in two ways: (1) using user_nl_clm, or (2) after the params file is generated, using tools to manipulate the netcdf file and pointing to your new file.

Does that address your question?

@wwieder
Copy link
Contributor

wwieder commented Sep 30, 2020 via email

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 30, 2020

In terms of the PPE work. I think this project is too big to get done before simulations need to be started. And once you have them started I don't think you want to change mid-stream to a different mechanism. But, yes one of the end goals of this project is to remove the divide between namelist and params file settings for users. so it should be easier to work with.

@jedwards4b
Copy link
Contributor

Has there been any activity on this task?
errput: File::Glob::glob() will disappear in perl 5.30. Use File::Glob::bsd_glob() instead. at /work/02503/edwardsj/cesm2_x.ihesp/components/clm/bld/CLMBuildNamelist.pm line 4193.

@billsacks
Copy link
Member Author

No, there hasn't :-( I'll open an issue for glob vs. bsd_glob - thanks for bringing that to our attention.

@ekluzek
Copy link
Collaborator

ekluzek commented Dec 7, 2021

@jedwards4b I updated #1550 to point out that this issue was fixed previously. I think the version in /work/02503/edwardsj/cesm2_x.ihesp/components/clm must be an older clm before this was fixed. The fix came in Feb/2020.

Note, there are some uses of glob in cime under utils/perl5lib though.

@ekluzek ekluzek added the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Jan 9, 2023
@ekluzek
Copy link
Collaborator

ekluzek commented Jan 9, 2023

Some recent developments make me want to bring this up again. Two things to note is param-dev and recent work on SLIM. Alper's param-dev work has gone into the latest CAM, which means it's available in cime and we could take advantage of it as well. In SLIM I have a PR that replaces the perl based build-namelist for SLIM with a simple python based one. That work would make it easier for me to do this replacement.

@billsacks billsacks removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Mar 2, 2023
@ekluzek
Copy link
Collaborator

ekluzek commented Sep 13, 2023

#2126 and some work I and @adrifoster was doing in the perl build-namelist "unit-tester" (it's not really a unit-tester, as it's not testing methods inside the perl, it's really a system tester of build-namelist) got me thinking about this again.

I was originally thinking that we should create a python script that just replicates the behavior of the perl build-namelist and then the perl "unit-tester" could still be used. However, I'm realizing that the perl "unit-tester" should also be replaced. And we probably don't want the python version to act the same way as the perl version.

But, an incremental way to go about this would be to do the following:

  1. Move the buildnml into a module that can be unit and system tested under the cime_config directory (this would be similar to what I did in SLIM)
  2. Have the python buildnml be able to be tested with the mechanisms under the python directory (this again is what I did in SLIM). This means we also have it tested with pylint (we already have it tested with black)
  3. New functionality (such as in Have the FATES parameter file created at runtime in the buildnml #2126) would be added into the buildnml python portion rather than the perl build-namelist part.
  4. Some targeted functionality from the perl could be moved over into the python buildnml part. (we would examine what could be done here without making the division between the perl and python parts too confusing)
  5. Move more parts incrementally? Or should it be the next option?
  6. Move everything in one big go?

It's steps 5 and 6 that I'm not sure about. Because doing it incrementally might maximize confusion on how to handle namelist work. But, doing it in one big go, is what prevents us from actually getting to this.

@adrifoster
Copy link
Collaborator

adrifoster commented Sep 13, 2023 via email

@billsacks
Copy link
Member Author

I feel we should do this incrementally; otherwise I fear it will never happen, and this is a big enough effort that trying to do it all at once will lead to the pain of trying to rewrite a moving target. I realize there will be some short-term confusion, but I think it's worth it. See #585 (comment) and other notes above referencing "incremental".

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 13, 2023

@adrifoster that is pretty much what @bilsacks proposed above in his comment (that now he responded to). I'm not sure what you mean about "individual tests"? Are you saying the ability to ask for which specific tests to run -- similar to what we do in create_test? I was really thinking of doing this the same way that we do our unit and system tests in the python directory. But, for the buildnml system tests I think you are right we should develop a system where you specify XML settings for the system tests, so you could cook up any kind of test that is desired. Maybe we then do something like testmods to bundle different types of system tests? I think I like that. That was definitely a problem with the perl testing it really needed a way to have potential tests done in a general way, or have a list of tests that you could do....

I thought I recalled that @billsacks decided later that would be bad to do as it would maximize confusion on how to handle namelist things. But, from his response above it sounds like he agrees since he's right that it won't happen unless we do it incrementally.

Although maybe that's a feature rather than a bug? Because if we get far enough in the "we have this both in python and perl world" -- the pain of being there might incentivize us to move solely into the python space? And we would pay the cost of getting it done in one big effort?

@adrifoster
Copy link
Collaborator

@adrifoster that is pretty much what @bilsacks proposed above in his comment (that now he responded to). I'm not sure what you mean about "individual tests"?

I guess I am thinking about yesterday when we had to run the whole test script over and over again (which takes a while) just to investigate what was going on with two very small tests. So if there was a way you could only call those tests that would be really helpful. Even if it was in some kind of interactive use of python?

@ekluzek
Copy link
Collaborator

ekluzek commented Apr 30, 2024

This should likely be done based on ParamGen which would help with issues: #26, #569 and #454

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker another issue/PR depends on this one code health improving internal code structure to make easier to maintain (sustainability) size: large Large project that will take a few weeks or more
Development

No branches or pull requests

6 participants