Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remake PCT_PFT raw datasets to NOT force landunit area to 100%, and some other fixes needed to PCT_PFT raw datasets #1556

Closed
billsacks opened this issue Nov 17, 2021 · 21 comments · Fixed by #2372
Assignees
Labels
bug something is working incorrectly science Enhancement to or bug impacting science
Milestone

Comments

@billsacks
Copy link
Member

billsacks commented Nov 17, 2021

This is connected with #1555 (and I realized this issue while investigating the mksurfdata_map code related to #1555 ) but this is a somewhat separate issue.

Currently, my understanding is that the PCT_PFT raw datasets are constructed so that the total landunit area of natural veg plus crop adds up to 100%: in source grid cells where the original data specify less than 100% cover of vegetation, PCT_PFT values are increased so that the raw data coming in to mksurfdata_map will specify 100% cover of PCT_NATVEG + PCT_CROP.

But as far as I can tell from an initial read through of mksurfdata_map, there is no reason why this needs to be the case, and I realized that there is a problem with this: In this call:

call gridmap_areaave_scs(tgridmap, pct_nat_pft_i(:,m), &
pct_nat_pft_o(:,m), nodata=0._r8, &
src_wt=pctnatveg_i*0.01_r8*tdomain%mask, &
dst_wt=pctnatveg_o*0.01_r8, frac_dst=frac_dst)

I'm pretty sure that the source (and dest) weighting imply that PCT_NATVEG (i.e., the landunit area) is used as a weighting factor in remapping PCT_NAT_PFT from source to dest. That seems like the right thing to do, but the correction of the raw datasets to force vegetated areas to 100% makes this weighting factor not work the way it should.

Consider a destination grid cell that has two source grid cells, with equal coverage:

  • Source 1: truly 100% natural veg, with 100% c3 grass
  • Source 2: truly 1% natural veg, 99% lake, where the natural veg portion is 100% bare ground

The correct result would be a grid cell with 50.5% natural veg, 49.5% lake, where the natural veg is almost entirely c3 grass (with just a tiny bit of bare ground). However, because the PCT_PFT raw dataset is adjusted to give 100% area in Source 2, the current result in the code would be 50% c3 grass and 50% bare ground.

To get correct PCT_NAT_PFT mappings, I believe the raw datasets should be constructed without forcing the total veg area to add to 100%.

See the bold note at the bottom of #1555 (comment) for something to check in the course of doing this. I believe it's safe to make the changes in this issue before or after #1555, though if I had to choose an order, I would do #1555 before updating the raw datasets, because there are less likely to be issues with assuming that PCT_NATVEG + PCT_CROP = 100% once we do #1555.


Edit: here is the bold note that I was referring to, just so it's all in one place:

When making these changes, we should double-check that there isn't any code that assumes that PCT_NATVEG + PCT_CROP adds up to 100% (or forces that to be the case). I didn't see anything like this in my initial, quick read-through, but we should double-check that. One specific thing to look for is: will code work correctly if the input PCT_NATVEG = PCT_CROP = 0%? (Or would such grid cells incorrectly be considered ocean, for example?)

@billsacks billsacks added bug something is working incorrectly tag: bug - impacts science next this should get some attention in the next week or two. Normally each Thursday SE meeting. labels Nov 17, 2021
@billsacks billsacks removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Nov 18, 2021
@billsacks billsacks added this to the ctsm5.2.0 milestone Nov 18, 2021
@lawrencepj1
Copy link

Hi @billsacks

Yes this may be the case. The forcing to 100% Crop + PCT_NATVEG to fill in lakes, glaciers, wetland and urban on the raw data was historical going back through all versions used with mksurf until at least CLM4. I am traveling back from AGU today but can have a more detailed look on Monday.

best
Peter

@billsacks
Copy link
Member Author

As discussed in #1716 we will want the landunit areas to be specified as percent of the total grid cell area (NOT percent of the land area in that grid cell).

@wwieder
Copy link
Contributor

wwieder commented Aug 25, 2022

@lawrencepj1 can we address this now that there is a CTSM 5.2 branch and decide if / how to address this issue?

@lawrencepj1
Copy link

@wwieder and @billsacks Yes this is simple to do in the CTSM 5.2 raw data. We can simply scale all the land unit component percentages by landfrac. This will require the mksurfdata tool be changed so the requirement for the land units to add up to 100% will need to be removed. The other option is for the remaining difference in land units to be allocated to percent wetland in the raw data. Either way this would be a simple fix.

@billsacks
Copy link
Member Author

@lawrencepj1 landfrac may come into this somewhat, but I think the main issue is the scaling based on special landunit (non-vegetated) areas in each source grid cell. We should probably talk to make sure we're on the same page about this before you make any changes.

@wwieder
Copy link
Contributor

wwieder commented Aug 25, 2022

maybe a quick meeting would be helpful to discuss? I'll set up a 30 min slot immediately after next weeks SE meeting. Currently invited @dlawrenncar @billsacks and @lawrencepj1 to join. Should others be involved?

@lawrencepj1
Copy link

lawrencepj1 commented Aug 25, 2022 via email

@ekluzek
Copy link
Collaborator

ekluzek commented Aug 25, 2022

Invite @slevisconsulting as well.

@wwieder

This comment was marked as resolved.

@ekluzek

This comment was marked as resolved.

@slevis-lmwg
Copy link
Contributor

From an email from Peter:
The new historical data for 1850 - 2015 with the new absolute values is in the directory:
/glade/p/cesm/sdwg_dev/thesis/data/cesm_tools/surfacedata/ctsm52finalrawdata/globalctsm52nrhistLUH2deg025_220918

Peter tested with an old version of mksurfdata_map. I will test with my copy of mksurfdata_esmf.

@slevis-lmwg
Copy link
Contributor

@wwieder and @billsacks Yes this is simple to do in the CTSM 5.2 raw data. We can simply scale all the land unit component percentages by landfrac. This will require the mksurfdata tool be changed so the requirement for the land units to add up to 100% will need to be removed. The other option is for the remaining difference in land units to be allocated to percent wetland in the raw data. Either way this would be a simple fix.

I have compared fsurdat files generated with the new and old raw data. I see diffs; however, this may be because I did not modify the code as you suggested @lawrencepj1 . I will look into that.

@slevis-lmwg
Copy link
Contributor

From meeting with @lawrencepj1 this morning:
Peter approved the diffs between fsurdat files that I generated with mksurfdata_esmf using the new and old raw data that he provided. Peter clarified that this specific change to the raw data was not the one that would require a code change to mksurfdata.

@slevis-lmwg
Copy link
Contributor

The new historical data for 1850 - 2015 with the new absolute values is in the directory: /glade/p/cesm/sdwg_dev/thesis/data/cesm_tools/surfacedata/ctsm52finalrawdata/globalctsm52nrhistLUH2deg025_220918

@lawrencepj1 two things...

  1. For ctsm5.2 we will need updated files (like the 1850-2015 that you provided) for all the "transient" time periods, right? Unless we're waiting for other changes to the files associated with this or other github issues. Otherwise, we need files for 850-1849 and all the SSP/RCPs.
  2. To get all the new files rimported to the repository, we will need to change permissions on all these files. (But let's worry about that when all the files are ready.)

@billsacks
Copy link
Member Author

@lawrencepj1 - if I'm interpreting the data correctly, I think we still need another change to the raw datasets so that areas are specified as percent of the total gridcell area rather than percent of the land area (for consistency with our other landunit datasets). I looked at the PCT fields in mksrf_landuse_ctsm52_nrhistLUH2_1850.c220918.nc and noticed that the sum of (pct_natveg + pct_crop + pct_glacier + pct_lake + pct_urban + pct_wetland) is essentially 100% in all grid cells inside the landmask. What I would expect is that this sum is instead equal to landfrac everywhere.

This is connected to the comment you made a few months ago (#1556 (comment) ), and I apologize that I never really addressed that comment directly. Indeed, I think that fixing this means either adding a new scaling of all landunit areas by landfrac, or else removing some existing scaling by landfrac that forces the total areas to 100% in all grid cells.

@billsacks billsacks changed the title Remake PCT_PFT raw datasets to NOT force landunit area to 100% Remake PCT_PFT raw datasets to NOT force landunit area to 100%, and some other fixes needed to PCT_PFT raw datasets Nov 16, 2022
@billsacks
Copy link
Member Author

@lawrencepj1 (also @slevisconsulting and @ekluzek ) - In looking into some differences arising from changing mksurfdata to use LANDFRAC rather than the mask from the vegtype dataset, I noticed what seems to be another issue with this dataset that I feel should be fixed (unless you can explain why this is correct): From looking at the file /glade/p/cesm/sdwg_dev/thesis/data/cesm_tools/surfacedata/ctsm52finalrawdata/globalctsm52nrhistLUH2deg025_220918/mksrf_landuse_ctsm52_nrhistLUH2_1850.c220918.nc, there are 2483 grid cells with LANDFRAC exactly 0 but LANDMASK = 1. Some of these grid cells with LANDFRAC=0 have PCT_NATVEG and/or PCT_CROP > 0. This seems inconsistent and possibly problematic for the correctness of the resulting surface datasets: it seems to me that any point with LANDMASK = 1 or with PCT_NATVEG or PCT_CROP > 0 should have LANDFRAC > 0 – and, conversely, any point with LANDFRAC = 0 should have LANDMASK = PCT_NATVEG = PCT_CROP = 0 as well.

I have been assuming that the mesh file used with this dataset has a mask that is consistent with the LANDMASK on the PFT raw data files (there may be some other problems of inconsistency if that's not the case). If so, then any adjustment to LANDMASK on the raw datasets would also require a new mesh file.

@lawrencepj1
Copy link

@billsacks Yes you are right this is an error. Looking at what you have here and then the code this is a truncation of very low LANDFRAC values in generating the raw data. I have added a check that if the LANDFRAC is truncated then the remaining PFT and CFT values are all reset to 0. I ran the new code for 1850 in the file:

/glade/p/cesm/sdwg_dev/thesis/data/cesm_tools/surfacedata/ctsm52finalrawdata/globalctsm52nrhistLUH2deg025_220918/mksrf_landuse_ctsm52_nrhistLUH2_1850.newcheck.c220918.nc

with the changes between the original and newcheck files in the directory as:
change1850.nc

If you would like I can recreate all of the files in the directory with the new check included.

Thanks for catching this
Peter

@lawrencepj1
Copy link

@ekluzek and @slevisconsulting sorry I should have included you on the above comment as well

@billsacks
Copy link
Member Author

Thanks a lot @lawrencepj1 !. I checked some things in this new file and I agree that this solves the issue. Thank you!

Before remaking all new files: In addition to this change, can you also make the change in #1556 (comment) - or let me know if I'm misunderstanding something with that?

Then, yes, I think it would be good to remake all the files with this change and we should point to the new files by default.

In addition, I think we'll need to recreate the relevant mesh file so that its mask agrees with the new mask.

@lawrencepj1
Copy link

Thanks @billsacks I did update the code to do the scaling of the land units on the raw data to be the percentage of the grid cell with the new file in the directory as:

/glade/p/cesm/sdwg_dev/thesis/data/cesm_tools/surfacedata/ctsm52finalrawdata/globalctsm52nrhistLUH2deg025_220918/mksrf_landuse_ctsm52_nrhistLUH2_1850.newfrac.c220918.nc

Differences between newcheck and newfrac are in the directory in the file
change1850frac.nc

There are some 10e-6 changes where landfrac == 1.0 which is the result of doing all the truncations as double precision.

You are also right that we will need a new mesh file so that the mask agrees. I think we will also need to have the LAI and soilcolor files on the new mask so they will need to be recreated.

The final issues I was going to raise is that the LUH2 data has time varying urban extent. Previously we had static urban in CLM5 so this was kept as constant as current day on the raw data regardless of the year being generated. Now that we have transient urban in CTSM52 it would make sense to explicitly include the LUH2 urban values in the crop and pft calculations even if the LUH2 urban values aren't being used in the CTSM52 surface data. Using the transient LUH2 means that as we remove urban going back in time we can explicitly provide the crop and pft values that would have been there prior to the urban expansion. It also means that we can prescribe which crop and pft values would be replaced by potential future urban under an SSP rather than letting mksurfdata average them.

I have code that uses the transient LUH2 urban extent already working and the raw data produced works with mksurfdata, having minimal impact on the historical period. I would suggest that this new element would be good to include in the CTSM52 raw data going forward. I would be interested to hear what you thought about this idea. I passed it by @dlawrenncar a while ago and he seemed okay with the idea however we didn't explore it in too much detail.

Cheers
Peter

@billsacks
Copy link
Member Author

@lawrencepj1 awesome thanks a lot for making this change! I did a quick check myself and it looks like I would expect.

Regarding accounting for transient urban: Yes, I think this makes sense to me, too. Thanks a lot for thinking of it!

@samsrabin samsrabin added the science Enhancement to or bug impacting science label Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something is working incorrectly science Enhancement to or bug impacting science
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

6 participants