Skip to content

Conversation

@slevis-lmwg
Copy link
Contributor

@slevis-lmwg slevis-lmwg commented Jun 4, 2025

Description of changes

mksurfdata_esmf interpolates abm (ag. fire peak month) from the raw dataset to fsurdat with a "dominant value" algorithm. However, from what I can tell, the dominant value gets selected only from valid month values (1 through 12) which leads to valid month values appearing in areas of limited crop coverage and crop fires overestimated by approximately 10–20 Mha/year in these regions.

I propose a very simple change in the code. I will not have a chance to try it out before derecho returns to service. My current concern is that abm = 13 is a netcdf Fillvalue in the raw dataset. If that causes trouble, I may need to replace the file's Fillvalue.

Specific notes

Contributors other than yourself, if any:
@lifang0209 @samsrabin

CTSM Issues Fixed (include github issue #):
Resolves #3188
Resolves #2663

Are answers expected to change (and if so in what way)?
Changes fsurdat variable abm

Any User Interface Changes (namelist or namelist defaults changes)?
No

Does this create a need to change or add documentation? Did you do so?
No

Testing TODO:

  • Generate and compare f19 (and ne30?) fsurdat files with and without the code modifications

@lifang0209
Copy link

@slevis-lmwg Hi Sam, you mentioned that abm = 13 is a netcdf Fillvalue in the raw dataset.", I don't recall setting 13 as a FillValue in the abm raw data. If you are sure it is, then simply removing the FillValue setting (or assign another FillValue as you suggested if a FillValue must be specified when making surface data) could solve the dominant value selection issue. For example, if the coarse grid (e.g. 2 deg) has more than half of the 0.5-deg with an abm value of 13, then the dominant method should get abm=13 for the 2-deg grid cell.

Copy link
Member

@samsrabin samsrabin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unfortunately doesn't solve the problem. Fang's comment here means that a value of 13 actually needs to be ignored during interpolation, because it indicates the gridcell in question has no crop fires in the observations.

Additionally, there's this that I pointed out:

Because "month" is a modulo variable, I don't think anything other than dominant (mode) or nearest-neighbor should be used here. Imagine two gridcells, one with December and the other with February. A naive mean or median interpolation would give July as the result, when in reality it should be January.

@samsrabin samsrabin added the test: aux_clm Pass aux_clm suite before merging label Jun 4, 2025
@samsrabin
Copy link
Member

@lifang0209 I guess your proposed handling of 13 makes sense too—if more than half of the contributing gridcells have no crop fire, then the interpolated gridcell shouldn't either. It would be better, though, if we could scale the fraction of cropland that gets burned based on fraction of contributing gridcells with crop fire.

@samsrabin
Copy link
Member

Ohh, sorry @slevis-lmwg, I misunderstood—it looks like there is already code to get the dominant month, and you're trying to fix it! But then the issue title—'mksurfdata_esmf needs to interpolate "abm" with dominant instead of average'—is confusing.

@samsrabin samsrabin dismissed their stale review June 4, 2025 18:01

I misunderstood!

Copy link
Member

@samsrabin samsrabin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this does address the issue. However, it'd be good to have unit testing of this function to make sure it's behaving properly. I will plan to at least start the file structure needed for unit testing, but after @slevis-lmwg adds comments and improves variable names throughout the subroutine so that I know what's what.

Copy link
Member

@samsrabin samsrabin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Sam. I've added a request along with some TODOs where I think commenting or refactoring will be helpful.

@slevis-lmwg slevis-lmwg changed the base branch from master to alpha-ctsm5.4.CMIP7 June 5, 2025 21:49
@slevis-lmwg slevis-lmwg changed the base branch from alpha-ctsm5.4.CMIP7 to master June 5, 2025 21:59
slevis-lmwg added a commit to slevis-lmwg/ctsm that referenced this pull request Jun 5, 2025
@slevis-lmwg
Copy link
Contributor Author

Testing on derecho
PASS python -u and -s
PASS black and lint

@slevis-lmwg slevis-lmwg marked this pull request as ready for review June 18, 2025 18:10
@slevis-lmwg slevis-lmwg merged commit 7fc50e4 into ESCOMP:alpha-ctsm5.4.CMIP7 Jun 18, 2025
5 checks passed
@slevis-lmwg slevis-lmwg deleted the abm_dominant_in_mksurfdata_esmf branch June 18, 2025 18:13
@slevis-lmwg
Copy link
Contributor Author

I made a new branch tag with this merge:
alpha-ctsm5.4.CMIP7.03.ctsm5.3.055

@lifang0209
Copy link

@slevis-lmwg Could you tell me the path of the new 1.9x2.5 surface data file on the derecho generated using your revised code? I’d like to check it and ensure everything looks correct.

@slevis-lmwg
Copy link
Contributor Author

@lifang0209
derecho and casper seem down right now, but I made this new file while working on this PR:
/glade/campaign/cesm/cesmdata/cseg/inputdata/lnd/clm2/surfdata_esmf/ctsm5.4.0/surfdata_1.9x2.5_hist_2000_16pfts_c250617.nc
I also made new fsurdat files for all resolutions in /glade/derecho/scratch/slevis/temp/ctsm5.4/.../tools/mksurfdata_esmf (I don't remember the exact path but it's something like that).

@lifang0209
Copy link

@slevis-lmwg Thanks, Sam! I checked your 1.9x2.5 NC file, and the results look good. I can now remove the cropland coverage ≥10% condition from [PR #3204].
Since the valid ABM areas (1–12) in your file are smaller than those with cropland ≥10%, I need to recalibrate cropfire_a1. Do you have a 1850 1.9x2.5 78-PFTs surface dataset? I’d like to run a case to recalibrate it using the inverse method.

@slevis-lmwg
Copy link
Contributor Author

[...] Do you have a 1850 1.9x2.5 78-PFTs surface dataset? I’d like to run a case to recalibrate it using the inverse method.

This should be the file that you need: /glade/derecho/scratch/slevis/temp_work/ctsm5.4/mk_ctsm54_datasets/tools/mksurfdata_esmf/surfdata_1.9x2.5_hist_1850_78pfts_c250618.nc

@lifang0209
Copy link

@slevis-lmwg @samsrabin Using the dominant abm is definitely more reasonable than the earlier approach of averaging abm in regridding.
However, there is still an issue. In a 1.9x2.5 grid, your method results in invalid abm (e.g., 13 or 14) in grid cells where cropland coverage < 50%, leading some grid cells being unreasonably classified as having no crop fires.
A more reasonable approach when regrid abm from 0.5 deg raw data to coarser grids would be to assign the dominant valid abm value (1, 2, ...,12) in a grid cell if cropland coverage exceeds 10% (or if proportion of valid abm values in the grid cell > 10%), and assigned an invalid abm otherwise.

@samsrabin
Copy link
Member

@lifang0209 Can you share the script you're using to test this? That will help us solve it once and for all.

@slevis-lmwg
Copy link
Contributor Author

@lifang0209 let me clarify what the abm regridding algorithm did before and does now and let's try to come up with a solution.

  1. Before, we determined dominant abm from values 1 to 12 (min_valid to max_valid). If we found no dominant value, then we set abm to unsetmon = 13. This led to an underestimation of 13, because we used it only when no dominant existed.
  2. Now, we use the same algorithm with the following modifications:
  • We set abm = 14 in the raw data where LANDMASK = 0 (ocean) in the raw data. We do this in the code, keeping the file unchanged (other than removing the _FillValue attribute from value 13).
  • We set min_valid = 1, max_valid = 13, unsetmon = 14. If we find no dominant value from 1 to 13, we set abm to 14.

I have two requests that I can think of:

  • I do not want to modify this algorithm unless we find a mistake in it.
  • I do not want to introduce tuning assumptions to mksurfdata_esmf.

@slevis-lmwg
Copy link
Contributor Author

However, I would be open to using an updated raw dataset. Currently the raw dataset has values from 1 to 13, and the code uses 14 over ocean grid cells. You could update the file to set non-croplands (including oceans) to 14. Then the dominant algorithm would find fewer 13s than it does now, so values from 1 to 12 would increase again.

@lifang0209
Copy link

@slevis-lmwg Changing non-cropland from 13 to 14 in the raw data does not resolve the issue.

fig4
Fig. 1 2001-2020 CTSM5.4 fractional coverage of cropland, and GFED5 observed crop fires at 1.9x2.5 grids.

As shown in Fig. 1. areas with cropland exceeding 50% at coarse grid resolution are small, and crop fires generally occur in grids where cropland coverage exceeds 10%.

fig3
Fig. 2. distribution of valid abm (peak month of crop fires, 1 to 12) in the CTSM5.4 surface file at 1.9x2.5 grids using the old (average) and new (dominant value) methods, compared with abm raw data at 0.5 deg grids.

The issue primarily arises in regions with low cropland coverage, and its impact is not small.
The old one got too many valid abm values in these regions and so around 30% more global crop burned area, because it derive abm by averaging valid abm in the coarse grids (13 is set to FillValue).
In contrast, the new one get too few valid abm values in these regions because invalid abm dominate, resulting in around 30% less global crop burned area.

To get reasonable abm and avoid crop fire simulations sensitive to resolution, the methods I could think of:
(1) Assign the dominant valid abm value (1, 2, ...,12) in a grid cell if cropland coverage exceeds 10% (or if proportion of valid abm values in the grid cell > 10%), and assign an invalid abm otherwise.
(2) Use abm which is dominant in valid abm values by setting 13 as Fillvalue, and add an if statement in fire module code to limit crop fires in grids with cropland coverage > 10%.
(3) Provide correct 1.9x2.5 abm data. and apply those values for high-resolution grids within the coarse grid cells.
Option (3) may be simplest.

Please let me know your thoughts or if any part requires further clarification.

@lifang0209
Copy link

In the 0.5 deg raw data, non-croplands (including oceans) are set to 13, so option (2) works if we set the FillValue attribute from value 13 in mksurfdata_esmf and add a condition statement in the fire code. No need to add setting of 14 for oceans in mksurfdata_esmf. Right?

@slevis-lmwg
Copy link
Contributor Author

@lifang0209 I shared my calendar with you. I think we will resolve this more easily with a conversation because I (or both of us) may be misunderstanding. Feel free to check my availability and send me an invite.

@slevis-lmwg
Copy link
Contributor Author

Also, in your abm plots please show value 13, because it represents "no crop fire" rather than "invalid value" from my understanding and, if so, it is important to see this value's distribution.

@slevis-lmwg
Copy link
Contributor Author

@samsrabin has expressed interest in joining our meeting.
@lifang0209 please let me know if it's easier if I set up the meeting, in which case tell me some convenient times for you and whether you have access to google meet or prefer zoom.

@lifang0209
Copy link

@slevis-lmwg Tue, July 1, 5:30–6:30 PM, or Wed, July 2, from 4–5 PM work for me. Which do you prefer? I'm flexible with either Zoom or Google Meet.

@slevis-lmwg
Copy link
Contributor Author

I selected July 2nd, 4-5 pm Mountain Time, with Google Meet.

@lifang0209
Copy link

@slevis-lmwg Hi Sam, the raw 0.5° abm plots (with the distribution of 13 shown in gray):
fig3

In the raw data, both the ocean and areas without crop fires on land are set to 13.
The new raw data (abm05-250701) I generated today has a smaller area with valid abm values (1 to 12) because (1) I set 13 for 0.5° grids where cropland coverage in the 1.9°x2.5° grid is less than 10%, and (2) cropland coverage <10% regions are smaller in CTSM5.4 (used for the new abm) compared to CTSM5.2 (used for the old abm).

Now, with the new raw abm data (/glade/u/home/fangle/mksrf_abm_0.5x0.5_simyr2000.c250701.nc), we can retain the high resolution of abm for 0.5° or 1° simulations, avoid regridding issues in low cropland coverage regions, and resolve crop fire simulation sensitivity to resolution.

Remaining tasks:
(1) Make minor modifications in your mksurfdata_esmf file using the new raw abm data: remove the 14 setting and set 13 as FillValue.
(2) I recalibrate cropfire_a1 based on the 1850 1.9x2.5 78-PFTs surface dataset you generate from (1)

This is the best and simplest solution I could think of. Would it be better if we complete the remaining tasks before our meeting?

@slevis-lmwg
Copy link
Contributor Author

Thank you @lifang0209
The first remaining task reverts the abm regridding algorithm to the original, which does not seem to me as what we want. I will refrain from working on this until we meet and try to clarify everything.

@lifang0209
Copy link

It's not reverting to the original—dominant value is used rather than the average.
OK, let's meet tomorrow.

@slevis-lmwg
Copy link
Contributor Author

From meeting with @lifang0209 @samsrabin:
@slevis-lmwg to move this conversation to a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test: aux_clm Pass aux_clm suite before merging test: mksurfdata Test mksurfdata_esmf before merging

3 participants