Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure ozone luna #1370

Merged
merged 50 commits into from
Aug 6, 2021

Conversation

sunnivin
Copy link
Contributor

@sunnivin sunnivin commented May 9, 2021

Description of changes

Generalization of the scientific ozone-luna exploration done by @ziu1986 described in #1232. New calling structure implemented as suggested by @billsacks in #1276.

Specific notes

Contributors other than yourself, if any: @ziu1986, @danicalombardozzi, @billsacks

CTSM Issues Fixed (include github issue #): #1232

Are answers expected to change (and if so in what way)?
Yes, the history fields TLAI, GPP, NPP, carbon/nitrogen content, stomatal conductance, and photosynthesis are expected to drop by 10-20 % in accordance with earlier works.

Any User Interface Changes (namelist or namelist defaults changes)?
Yes, In the file bld/namelist_files/namelist_definition_ctsm.xml the namelist has been updated: use_ozone --> ozone_method. Valid options: ozone_method= unset,stress_lombardozzi2015,stress_falk. Furthermore the namelist variable o3_ppbv has also been added.

Testing performed, if any:
The new implementation has reproduced the scientific results obtained in #1232 for the single point in Brazil --resolution 1x1_brazil whit the --compset 2000_DATM%GSWP3v1_CLM50%BGC_SICE_SOCN_SROF_SGLC_SWAV.

…ozone == .true. and use_luna == .true.

o3coefjmax is initialized to 1 and a case check in Nitrogen_investments are therefor not necessary. Still this gives poor readability.
- Forgotten argument in function call for Nitrogen_investments
use_ozone --> ozone_method in non-obvious places
Corrected this%ozone_method --> this%stress_method
Note to self: Why is this needed here?
stress_method_xx --> stress_xx
Explanation of how o3coefjmax is different in the call for
Nitrogen_investments in LunaMod.F90
Default values for ozone_method with too many spaces
@sunnivin sunnivin requested a review from billsacks May 9, 2021 20:09
@sunnivin
Copy link
Contributor Author

I'm wondering if it would make sense to update the tests for o3 a bit?

To catch a test with o3_veg_stress_method=stress_falk would it make sense to restructure the tests in the folder CTSMROOT/cime_config/testdefs/testmods_dirs/clm/x, x=[o3,irrig_o3_reduceOutput]?

For me it seems natural to follow the structure by for instance the CTSMROOT/cime_config/testdefs/testmods_dirs/clm/clmMEC-test?

I'm thinking that we could name the new test folders like
CTSMROOT/cime_config/testdefs/testmods_dirs/clm/x_y, x=[o3,irrig_o3_reduceOutput], y=[lombardozzi2015, stress_falk]?

@sunnivin
Copy link
Contributor Author

Do you prefer to rebase or merge?

My development branch is now quite far behind ctsm's main. Do you prefer that I do a rebase or merge?

Furthermore my commit history is not exactly pretty at the current stage. I can take the time to clean up a bit if you like?

@ekluzek
Copy link
Collaborator

ekluzek commented Jun 23, 2021

@sunnivin I've personally usually done merges rather than rebase on PR branches. The end result should get you to the same point. The main reason I would see to prefer one over the other is if one has fewer conflicts than the other and as such is easier to do. So I would leave it up to you to decide. @billsacks is there a reason to prefer one over the other?

I would say "having a messy commit history" is a feature rather than a bug. I'd rather see the smaller commits, and have the history of what people did. It tends to be helpful to have smaller commits out there (for example when using git bisect), or just examining when things changed. Mostly that tends to be helpful for when I'm looking at my own commits. But, I have no problem with seeing others as well. If we had a huge developer base it's possible that smaller commits would be a problem, but I've never seen it that way.

@billsacks
Copy link
Member

Regarding rebase vs. merge: I generally prefer merges for updating a branch. The main times I use rebase is if I recently started some work then realized I should have started from a different point, or if I'm completely moving the branch history from one place to another. The reason I prefer merge is that it preserves the history of what happens rather than rewriting history. This is important for a few reasons, such as: (1) if you documented testing that was done on some hash; (2) if you have already opened a PR and people have reviewed it (I, for one, often make note of which commits I have already reviewed, so if the author rebases their branch it can throw me off); (3) if we need to go back and figure out what was done, e.g., to diagnose a problem that might have come in with the update to master, then using rebase can make this hard or impossible in some scenarios.

Regarding messy commit history: I don't care about this and feel it usually isn't worth the time to go back and clean up. Definitely don't rewrite history of anything you have already pushed to a PR: as with rebasing, that can really mess with the reviewers (at least me). I will sometimes clean up commits that I just made and haven't pushed anywhere yet – mainly in situations where I realized I should have made some other small change to a commit that I recently made, in which case I'll amend the commit or (if I have already made an intervening commit) will do a small interactive rebase to reorder and squash a few commits before pushing my branch.

I'll reply to your other questions soon.

Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these changes. They mostly look good, but see some very minor comments below (4 out of 5 of them are just the same spelling mistake repeated in a few places).

bld/namelist_files/namelist_defaults_ctsm.xml Outdated Show resolved Hide resolved
bld/namelist_files/namelist_definition_ctsm.xml Outdated Show resolved Hide resolved
src/main/clm_varctl.F90 Show resolved Hide resolved
src/main/controlMod.F90 Outdated Show resolved Hide resolved
src/main/controlMod.F90 Outdated Show resolved Hide resolved
@billsacks
Copy link
Member

billsacks commented Jun 25, 2021

I'm wondering if it would make sense to update the tests for o3 a bit?

To catch a test with o3_veg_stress_method=stress_falk would it make sense to restructure the tests in the folder CTSMROOT/cime_config/testdefs/testmods_dirs/clm/x, x=[o3,irrig_o3_reduceOutput]?

For me it seems natural to follow the structure by for instance the CTSMROOT/cime_config/testdefs/testmods_dirs/clm/clmMEC-test?

I'm thinking that we could name the new test folders like
CTSMROOT/cime_config/testdefs/testmods_dirs/clm/x_y, x=[o3,irrig_o3_reduceOutput], y=[lombardozzi2015, stress_falk]?

I remember going back and forth last time we talked as to whether it was worth changing or adding a test to cover this new code. But now I'll say yes:

  • Let's change one test to cover this new code

I think it's sufficient to have a total of two ozone tests, but let's change one of the existing tests to the Falk method. So we could have testmods named o3lombardozzi2015 (and change the ERP_D_Ld5_P48x1.f10_f10_mg37.I2000Clm50Sp.izumi_nag.clm-o3 test to use that testmod) and irrig_o3falk_reduceOutput (and change the ERP_P72x2_Ly3.f10_f10_mg37.I2000Clm50BgcCrop.cheyenne_intel.clm-irrig_o3_reduceOutput test to use that testmod) (or the other way around).

  • When doing final testing, though, we should recreate testmod directories corresponding to what's in place now and run tests equivalent to what are currently in the test suite. This is important so that we can do baseline comparisons of both of these tests.

  • In addition, as we discussed a few weeks ago, I'd suggest running one test (as a one-time thing) that turns on the Falk method and is a global ERP_PNx2_D test (for some value of N = number of MPI tasks). It's helpful to run a test like this covering new code.

Finally, I had two other items on my list from when we talked last.

  • You were going to add an error check: can only use the Falk ozone method if LUNA is on

  • You were going to add an end-of-day check as noted in Restructure ozone luna  #1370 (comment) (though that isn't absolutely necessary if you're out of time)

@sunnivin
Copy link
Contributor Author

I think it's sufficient to have a total of two ozone tests, but let's change one of the existing tests to the Falk method. So we could have testmods named o3lombardozzi2015 (and change the ERP_D_Ld5_P48x1.f10_f10_mg37.I2000Clm50Sp.izumi_nag.clm-o3 test to use that testmod) and irrig_o3falk_reduceOutput (and change the ERP_P72x2_Ly3.f10_f10_mg37.I2000Clm50BgcCrop.cheyenne_intel.clm-irrig_o3_reduceOutput test to use that testmod) (or the other way around).

This is clear!

@sunnivin
Copy link
Contributor Author

sunnivin commented Jun 25, 2021

  • When doing final testing, though, we should recreate testmod directories corresponding to what's in place now and run tests equivalent to what are currently in the test suite. This is important so that we can do baseline comparisons of both of these tests.
  • In addition, as we discussed a few weeks ago, I'd suggest running one test (as a one-time thing) that turns on the Falk method and is a global ERP_PNx2_D test (for some value of N = number of MPI tasks). It's helpful to run a test like this covering new code.

I think that what you mean here is that I can update the file cime_config/testdefs/testlist_clm.xml with the changed name of the testdirectories, but when the final testing is done for the PR we have to revert the names back so the basemod comparisons are like before?

It is indeed a good idea to run the global test case for o3_veg_stress_method=stress_falk. I can do that on our cluster when I have merged in the latest version of the current master branch.

@billsacks
Copy link
Member

I think that what you mean here is that I can update the file cime_config/testdefs/testlist_clm.xml with the changed name of the testdirectories, but when the final testing is done for the PR we have to revert the names back so the basemod comparisons are like before?

Yes. For final testing we want to run both the new tests and the two original tests. We can do that by temporarily putting back in place testmod directories with the old names (which both use the Lombardozzi method) (no need to commit them to the repository) then running tests using those test mods. I can help run those if what I'm saying isn't clear – it's easier to do than to explain :-)

Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great now - thank you, @sunnivin ! I'm happy to do final testing on this, but I'm thinking I'll do it at a time when I can screen share with @ziu1986 and anyone else who's interested, to walk them through my process.

@billsacks
Copy link
Member

Initial testing went pretty smoothly, thanks to all of the work put in by @sunnivin and @ziu1986 . There was one subtle restart issue that just impacted the new ozone diagnostic (history) fields: these were only being set in the exposed veg filter, which means that their values were indeterminate in patches that had been exposed earlier in the simulation but were now snow-covered. I have fixed that by explicitly resetting values to 1 every time step in the noexposedveg filter. See billsacks@afad470

This ties in with a question that came up in our discussion a few months ago: should these non-exposed veg patches contribute to grid cell averages? I have expanded on the comment in the ozone initHistory routine with some thoughts on this that we'll want to return to later.

I am now rerunning full testing.

@billsacks
Copy link
Member

One other minor issue was that pgi tests changed answers with ozone off (other compilers did not). I have verified that this comes from the extra term in LUNA: multiplying by 1 must lead to roundoff-level changes. I have verified that, if I introduce this diff off of master:

billsacks@011c79f

it is bit-for-bit with this branch.

@ziu1986
Copy link

ziu1986 commented Aug 6, 2021

Initial testing went pretty smoothly, thanks to all of the work put in by @sunnivin and @ziu1986 . There was one subtle restart issue that just impacted the new ozone diagnostic (history) fields: these were only being set in the exposed veg filter, which means that their values were indeterminate in patches that had been exposed earlier in the simulation but were now snow-covered. I have fixed that by explicitly resetting values to 1 every time step in the noexposedveg filter. See billsacks@afad470

I am now rerunning full testing.

Thanks a lot @billsacks and @sunnivin for making this PR come true!

This ties in with a question that came up in our discussion a few months ago: should these non-exposed veg patches contribute to grid cell averages? I have expanded on the comment in the ozone initHistory routine with some thoughts on this that we'll want to return to later.

I remember that I came across this issue some years ago when adapting code in another model. Back then (if I remember correctly) I decided on dropping non-expo fields from the average.

I have mixed feelings about it now. Because it depends on the use case of the field:
a.) Coupling to the atmosphere (I suppose still happens at grid level) expects the "real" grid average including the non-expo patches. Else ozone would be biased.
b.) Diagnostics of plant exposure would benefit from excluding non-expo patches as including them would likely lead to an biased of ozone at grid level.
The direction of the bias depends on the non-expo patch of cause. Usually, ozone is lower over water bodies while higher over urban canopies or ice/snow covered surfaces.

To cut it short: We might need both eventually.

@ziu1986
Copy link

ziu1986 commented Aug 6, 2021

One other minor issue was that pgi tests changed answers with ozone off (other compilers did not). I have verified that this comes from the extra term in LUNA: multiplying by 1 must lead to roundoff-level changes. I have verified that, if I introduce this diff off of master:

billsacks@011c79f

it is bit-for-bit with this branch.

I'm curious about the bfb test. I suspect that the float variable which is multiplied is not exactly 1 numerically, as you said @billsacks. How do you usually handle this?

@billsacks
Copy link
Member

I'm curious about the bfb test. I suspect that the float variable which is multiplied is not exactly 1 numerically, as you said @billsacks. How do you usually handle this?

It's exactly 1 (as you can see in billsacks@011c79f – and noting that 1 can be represented exactly in floating point), but the issue (I'm pretty sure) is that introducing this extra term leads the compiler to do something different in terms of order of operations, which is mathematically equivalent (assuming no major compiler bugs) but results in different answers due to floating point math.

I usually handle this similarly to what I did here: coming up with a hypothesis about what is causing the answer changes, then making a minimal set of changes off of master to get the same answer-changing behavior and verifying that the full branch is bit-for-bit with this one-off branch. To state this differently, my goal (which is sometimes easier and sometimes harder to achieve) is the following: If master is point A and the full branch is point C, then I want a point B such that

  • C is bit-for-bit with B
  • B can be seen from a quick review to be effectively the same as A – i.e., it is obvious from examining the diffs in B that there are no real answer changes other than what might arise from floating point roundoff issues.

Then I can put those two pieces of information together to be confident that C is effectively the same as A. (Or sometimes I'll do it the other way around, so that B is a small one-off from C, and B is bit-for-bit with A.)

I then try to document what I did in the ChangeLog so that I or others can review / audit this verification later if needed.

@ziu1986
Copy link

ziu1986 commented Aug 6, 2021

I'm curious about the bfb test. I suspect that the float variable which is multiplied is not exactly 1 numerically, as you said @billsacks. How do you usually handle this?

It's exactly 1 (as you can see in billsacks@011c79f – and noting that 1 can be represented exactly in floating point),

Of course! my bad. Thanks for reminding me 👍

but the issue (I'm pretty sure) is that introducing this extra term leads the compiler to do something different in terms of order of operations, which is mathematically equivalent (assuming no major compiler bugs) but results in different answers due to floating point math.

That's what I actually had in mind. I just cannot express it as eloquently (and proper) as you.

I usually handle this similarly to what I did here: coming up with a hypothesis about what is causing the answer changes, then making a minimal set of changes off of master to get the same answer-changing behavior and verifying that the full branch is bit-for-bit with this one-off branch. To state this differently, my goal (which is sometimes easier and sometimes harder to achieve) is the following: If master is point A and the full branch is point C, then I want a point B such that

* C is bit-for-bit with B

* B can be seen from a quick review to be effectively the same as A – i.e., it is obvious from examining the diffs in B that there are no real answer changes other than what might arise from floating point roundoff issues.

Then I can put those two pieces of information together to be confident that C is effectively the same as A. (Or sometimes I'll do it the other way around, so that B is a small one-off from C, and B is bit-for-bit with A.)

I then try to document what I did in the ChangeLog so that I or others can review / audit this verification later if needed.

Thank you for sharing your insight! 💡 So, it is not much different from how I usually test results.
It probably takes a good deal of experience to quickly locate these issues.

@billsacks billsacks merged commit b7db1e3 into ESCOMP:master Aug 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done (non release/external)
Development

Successfully merging this pull request may close these issues.

4 participants