-
Notifications
You must be signed in to change notification settings - Fork 353
Ocean/develop: add CF-compliant _FillValue attribute #533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Although this is a PR to |
@mgduda, thanks for the feedback.
Indeed, we would like these changes to affect other cores in E3SM, so the changes to the registry parser would almost certainly make their way to
Could you elaborate or point us to documentation about this? To me, they seem entirely redundant and I'm not aware of NetCDF supporting anything other than |
@wenshanw, thanks very much for making this pull request. As you can see, this will be a good place to get some discussion going about the changes you are working on an make sure we avoid any unintended consequences. |
This work is a first step at addressing #437 |
This looks like a simple enough change. If the testing shows that the output files have a CF-compliant fill value, then the changes should be fine. |
// Uncomment to add _FillValue to match missing_value | ||
// fortprintf(fd, " call mpas_add_att(%s(%d) %% attLists(const_index) %% attList, '_FillValue', %s)\n", pointer_name, time_lev, missing_value); | ||
fortprintf(fd, " call mpas_add_att(%s(%d) %% attLists(const_index) %% attList, '_FillValue', %s)\n", pointer_name, time_lev, missing_value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a framework file, we need to merge this into the develop
branch instead, and then I will merge develop
into the LANL branches: ocean/develop
, seaice/develop
, landice/develop
. @wenshanw, don't worry about that right now. This PR is convenient for testing, and I can help with the different branches after testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I was thinking that would be the right approach, too, and that @wenshanw shouldn't have to worry about that. We need to do more testing anyway.
@@ -61,7 +61,7 @@ | |||
|
|||
<var_struct name="state" time_levs="2"> | |||
<var_struct name="tracers" time_levs="2"> | |||
<var_array name="activeTracers" dimensions="nVertLevels nCells Time" type="real" packages="activeTracersPKG" > | |||
<var_array name="activeTracers" dimensions="nVertLevels nCells Time" type="real" packages="activeTracersPKG" missing_value="FILLVAL"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we need to add missing_value="FILLVAL"
to every var_array
and var
(not within a var_array
) in the registry? It would be nice to set missing_value="FILLVAL"
as the default so no changes need to be made to the Registry.
@mgduda would a default setting of the missing_value="FILLVAL"
cause problems for the atmospheric core?
An alternative is to have a global variable or attribute at the top of the registry file so each core can specify the missing_value for all variables with a single line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@czender seemed to think that we would want to manually mark variables with _FillValue
and that the default should be that variable don't have a fill value. It isn't good practice to mark variable that are valid everywhere as having a _FillValue
, for example. But I'll let @czender further clarify.
I don't know what missing_value
is for so I won't comment on that until I get clarity from @mgduda.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct. Variables that are valid everywhere (e.g., most of EAM) will never use a _FillValue
and therefore should not a the _FillValue
attribute. The only reason why is efficiency: _FillValue
slows-down code. In-line code and generic analysis tools must check every single value of a variable with _FillValue
before operating on that value. Lots of if
's. When variables do not have _FillValue
we can assume all values are valid then the analysis of the variable needs no if
's. That said, it is perfect legal to have a _FillValue
attribute that is never used. It is just more efficient not to.
To be clear, all this change does so far is add the attribute to NetCDF output with the correct value. Changes (likely many of them) still need to be made in the code so the missing value within data arrays actually gets filled in with the proper value. We have been exceedingly bad about this in MPAS development so far. So this will be a big pain for @wenshanw to fix, with my help and hopefully help from others from the various MPAS teams. |
We thought it was a good idea to break the process into smaller PRs that do no harm, rather than a mega PR that affects a large number of files at once. |
@wenshanw and @xylar , thank you for your work on this! This is long overdue and will be useful. |
@matthewhoffman, the work is on an E3SM confluence page. That may end up being a bit of an unfortunate place for the discussion only because @mgduda and others not in E3SM can't access it. At the same time, this is work being supported by the E3SM infrastructure team so that does see like an appropriate place to keep track of the work. Suggestions? |
I think we need a developer's telecon anyway to discuss #472, so I would be very eager to have one scheduled. I agree, this would be another issue for the agenda. |
Indeed there is a slight difference between |
It may be that my recollection is fuzzy. I had been thinking that our intent back c. 2016 was that For reference, commit 88bff9c seems to have been the original commit that added both |
Adding However, I'm concerned that there may be a performance penalty to pre-writing all variables with |
I'm definitely in favor of the suggestion by @xylar to break development into small PRs that can be easily reviewed rather than creating one large PR spanning several parts of the code. As @matthewhoffman said it can be really helpful to have an outline of the smaller steps that are planned to accomplish the larger objective. |
@mgduda, it sounds like the difference between In terms of implementation, it is hard for me to see how first filling the array with |
@xylar Perhaps it's true from the perspective of certain post-processing tools that there's no distinction between
For a specific example, consider computing the geopotential height of the 700 hPa surface for a limited-area domain: below high mountains, the field is ill-defined, and outside of the domain, the field is neither computed nor written to an output file. I agree that a post-processing tool that simply aimed, e.g., to average all valid values wouldn't care about the distinction between these statements; but it doesn't mean that other tools, or a model developer, would see no distinction. Regardless, as I mentioned, it may have been that we decided to go with |
@mgduda, thanks for clarifying further. For the current project, we are only interested in Having A quick grep for
I can't pretend that I have a very thorough understand either NetCDF Fortran output or MPAS's wrapper around it. Are you saying that if we set the If that's not what you're saying, could you try to clarify further? |
I can shed some light here. First, know that when CAM/EAM and CLM/ELM define a |
@@ -2031,6 +2031,7 @@ | |||
<var_struct name="state" time_levs="2"> | |||
<var name="normalVelocity" type="real" dimensions="nVertLevels nEdges Time" units="m s^{-1}" | |||
description="horizonal velocity, normal component to an edge" | |||
missing_value="FILLVAL" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wenshanw, this spacing needs to be done with tabs rather than spaces to be compatible with the rest of the XML in MPAS. (I know it's hard to see the difference in a lot of editors.)
@wenshanw, @czender and @mark-petersen, could we come up with a check-list of things that need to be done before this pull request (PR) is ready to merge? I mostly want to make sure we define what the scope of this PR and keep it simple.
|
@xylar @wenshanw @mark-petersen If I understood correctly, one of the items that came out of our discussion on 4 May was the desire to use the |
Modified registry XML files and registry parser to add the _FillValue attribute to both state and tracer variables.