Skip to content

Conversation

@cbegeman
Copy link
Contributor

Summary

Objectives:

  • Cache additional MPAS-Analysis files needed to extend time series plots from existing analysis when the source files are not present in the output path (typically because they have been long-term archived)
  • Remove an MPAS-Analysis file that may be large (with nCells dimensions) when MPAS-Analysis has been completed which is not needed to extend time series plots

Select one: This pull request is...

  • a bug fix: increment the patch version
  • a small improvement: increment the minor version
  • a new feature: increment the minor version
  • an incompatible (non-backwards compatible) API change: increment the major version

Small Change

  • To merge, I will use "Squash and merge". That is, this change should be a single commit.
  • Logic: I have visually inspected the entire pull request myself.
  • Pre-commit checks: All the pre-commits checks have passed.

Comment on lines +305 to +308
for file in "${files[@]}"
do
cp ${identifier}/timeseries/${file} cache/timeseries/${file}
done
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonbob This is the set of lines that will need to be run once manually to cache the files that weren't cached with the released version of MPAS-Analysis

Note: defined above files=( "mpasIndexOcean.nc" "mpasTimeSeriesOcean.nc" "seaIceAreaVolNH.nc" "seaIceAreaVolSH.nc")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbegeman I think this line might be the trouble that caused cp: cannot stat 'cache/mpasIndexOcean.nc': No such file or directory issues in my second run test. The Cache output files are saved in cache/timeseries, however when the second run is reading files from cache, it tried to find files from cache

@chengzhuzhang
Copy link
Collaborator

@cbegeman thank you for working on this update. I looks like the mpas-analysis update has been merged (MPAS-Dev/MPAS-Analysis#1126) and @xylar has an activation script to use this latest version. To be able to test this with the HR spin-up run through zppy, we will need to finalize this branch as well. Is this branch ready for review?

@xylar
Copy link
Contributor

xylar commented Dec 1, 2025

and @xylar has an activation script to use this latest version

Not quite, but I will tomorrow...

@chengzhuzhang
Copy link
Collaborator

and @xylar has an activation script to use this latest version

Not quite, but I will tomorrow...

No hurry, thank you @xylar.

@cbegeman cbegeman marked this pull request as ready for review December 1, 2025 22:51
@cbegeman
Copy link
Contributor Author

cbegeman commented Dec 1, 2025

@chengzhuzhang and @xylar Thanks for your help! Yes, it is ready for review.

@forsyth2
Copy link
Collaborator

forsyth2 commented Dec 1, 2025

@xylar Let me know once you have the activation commands and then I can run a subset of the zppy test suite on the latest mpas_analysis specifically.

@xylar
Copy link
Contributor

xylar commented Dec 2, 2025

Okay, the environment and activation script are in place:

source /lcrc/soft/climate/e3sm-unified/load_hr_mpas_analysis.sh

@chengzhuzhang
Copy link
Collaborator

Thank you @cbegeman and @xylar.
I'm look for test case that can mimic the use case expected in ongoing v3HR spining up run. @wlin7 or @cbegeman do you have a good test in mind?

@forsyth2
Copy link
Collaborator

forsyth2 commented Dec 2, 2025

Testing steps -- `mpas_analysis` subset
cd ~/ez/zppy
git status
# On branch add-legacy-cfgs
# nothing to commit, working tree clean
git remote add cbegeman git@github.com:cbegeman/zppy.git
git remote -v
# cbegeman	git@github.com:cbegeman/zppy.git (fetch)
# cbegeman	git@github.com:cbegeman/zppy.git (push)

git fetch cbegeman fix-mpas-analysis-cache
git checkout -b fix-mpas-analysis-cache cbegeman/fix-mpas-analysis-cache
git log
# Good, last commits:
# Add additional files to the mpas-analysis cache for time series analysis
# Bump to 3.1.0 (#759)

# Edit tests/integration/utils.py
# TEST_SPECIFICS:
# TEST_SPECIFICS: Dict[str, Any] = {
#     # These are custom environment_commands for specific tasks.
#     # Never set these to "", because they will print the line
#     # `environment_commands = ""` for the corresponding task,
#     # thus overriding the value set higher up in the cfg.
#     # That is, there will be no environment set.
#     # (`environment_commands = ""` only redirects to Unified
#     # if specified under the [default] task)
#     "diags_environment_commands": "source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh",
#     "global_time_series_environment_commands": "source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh",
#     "pcmdi_diags_environment_commands": "source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh",
#     # This is the environment setup for other tasks.
#     # Leave as "" to use the latest Unified environment.
#     "environment_commands": "source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh",
#     "cfgs_to_run": [
#         "weekly_bundles",
#         "weekly_comprehensive_v2",
#         "weekly_comprehensive_v3",
#         "weekly_legacy_3.0.0_bundles",
#         "weekly_legacy_3.0.0_comprehensive_v2",
#         "weekly_legacy_3.0.0_comprehensive_v3",
#     ],
#     "tasks_to_run": [
#         # "e3sm_diags",
#         "mpas_analysis",
#         # "global_time_series",
#         # "ilamb",
#         # "pcmdi_diags",
#     ],
#     "unique_id": "test-fix-mpas-analysis-cache",
# }

git diff
# Diff looks good

lcrc_conda # Activate conda
rm -rf build
conda clean --all --y
conda env create -f conda/dev.yml -n test-fix-mpas-analysis-cache
conda activate test-fix-mpas-analysis-cache
pre-commit run --all-files
python -m pip install .

pytest tests/test_*.py
# 35 passed in 0.40s
python tests/integration/utils.py
# Everything is using Unified.
# We still need to add the mpas-analysis specific environment!

git add -A
git commit -m "Changes for testing"

# Add mpas_analysis_environment_commands
git diff
# Diff looks good
python tests/integration/utils.py
# Good, now we have:
# mpas_analysis_environment_commands=source /lcrc/soft/climate/e3sm-unified/load_hr_mpas_analysis.sh
pre-commit run --all-files # Iterate until passing
git add -A
python tests/integration/utils.py # Rerun
git status # Confirm there were no changes based on the reun
git commit -m "Add mpas_analysis_environment_commands"

# Now, we have one more testing change to make.
# The bundles test cfgs don't test for mpas_analysis, so we can remove them:
# "cfgs_to_run": [
#     # "weekly_bundles",
#     "weekly_comprehensive_v2",
#     "weekly_comprehensive_v3",
#     # "weekly_legacy_3.0.0_bundles",
#     "weekly_legacy_3.0.0_comprehensive_v2",
#     "weekly_legacy_3.0.0_comprehensive_v3",
# ],
python tests/integration/utils.py
git diff
# Diff looks good

zppy -c tests/integration/generated/test_weekly_comprehensive_v2_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_comprehensive_v3_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_legacy_3.0.0_comprehensive_v2_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_legacy_3.0.0_comprehensive_v3_chrysalis.cfg
sq | wc -l # Tue 12/02 10:47 => 82 - header row = 81 jobs
# For reference:
alias sq
# alias sq='sqa -u ac.forsyth2'
alias sqa
# alias sqa='squeue -o "%8u %.7a %.4D %.9P %7i %.2t %.10r %.10M %.10l %j" --sort=P,-t,-p'

sq | wc -l # Tue 12/02 11:17 => 12 - header row = 11 jobs
sq
# All jobs have state: pending for reason: dependency.
# So, nothing more is going to run.

# Review finished runs
### v2  ###
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test-fix-mpas-analysis-cache/v2.LR.historical_0201/post/scripts
grep -v "OK" *status
# global_time_series_1980-1990.status:WAITING 1028901
# mpas_analysis_ts_1980-1984_climo_1980-1984.status:ERROR (1)
# mpas_analysis_ts_1980-1990_climo_1985-1990.status:WAITING 1028900
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_legacy_3.0.0_comprehensive_v2_output/test-fix-mpas-analysis-cache/v2.LR.historical_0201/post/scripts
grep -v "OK" *status
# global_time_series_1980-1990.status:WAITING 1028948
# mpas_analysis_ts_1980-1984_climo_1980-1984.status:ERROR (1)
# mpas_analysis_ts_1980-1990_climo_1985-1990.status:WAITING 1028947
### v3 ###
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-fix-mpas-analysis-cache/v3.LR.historical_0051/post/scripts
grep -v "OK" *status
# global_time_series_classic_pdf_both_1985-1995.status:WAITING 1028927
# global_time_series_classic_pdf_original_1985-1995.status:WAITING 1028928
# global_time_series_viewer_both_1985-1995.status:WAITING 1028924
# global_time_series_viewer_original_1985-1995.status:WAITING 1028925
# mpas_analysis_ts_1985-1989_climo_1985-1989.status:ERROR (1)
# mpas_analysis_ts_1985-1995_climo_1990-1995.status:WAITING 1028923
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_legacy_3.0.0_comprehensive_v3_output/test-fix-mpas-analysis-cache/v3.LR.historical_0051/post/scripts
grep -v "OK" *status
# global_time_series_1985-1995.status:WAITING 1028965
# mpas_analysis_ts_1985-1989_climo_1985-1989.status:ERROR (1)
# mpas_analysis_ts_1985-1995_climo_1990-1995.status:WAITING 1028964

@cbegeman, @xylar -- I encountered some errors while testing. Output paths below:

# Let's review the errors
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test-fix-mpas-analysis-cache/v2.LR.historical_0201/post/scripts
emacs mpas_analysis_ts_1980-1984_climo_1980-1984.o1028899
# TypeError: cannot pickle 'lxml.etree._ElementTree' object
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_legacy_3.0.0_comprehensive_v2_output/test-fix-mpas-analysis-cache/v2.LR.historical_0201/post/scripts
emacs mpas_analysis_ts_1980-1984_climo_1980-1984.o1028946
# TypeError: cannot pickle 'lxml.etree._ElementTree' object
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-fix-mpas-analysis-cache/v3.LR.historical_0051/post/scripts
emacs mpas_analysis_ts_1985-1989_climo_1985-1989.o1028922 
# TypeError: cannot pickle 'lxml.etree._ElementTree' object
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_legacy_3.0.0_comprehensive_v3_output/test-fix-mpas-analysis-cache/v3.LR.historical_0051/post/scripts
emacs mpas_analysis_ts_1985-1989_climo_1985-1989.o1028963
# TypeError: cannot pickle 'lxml.etree._ElementTree' object

Common theme seems to be:

cp: cannot stat 'cache/mpasIndexOcean.nc': No such file or directory
cp: cannot stat 'cache/mpasTimeSeriesOcean.nc': No such file or directory
cp: cannot stat 'cache/seaIceAreaVolNH.nc': No such file or directory
cp: cannot stat 'cache/seaIceAreaVolSH.nc': No such file or directory
...
# TypeError: cannot pickle 'lxml.etree._ElementTree' object

@cbegeman
Copy link
Contributor Author

cbegeman commented Dec 2, 2025

@forsyth2 Thanks for testing!

Is it clear to any of you whether the cache files being absent at the first call to mpas_analysis is responsible for the TypeError? From the mpas_analysis log it looks like the script moves past the cp calls.

@forsyth2
Copy link
Collaborator

forsyth2 commented Dec 2, 2025

Note: I'll be merging #763 to update the test suite accordingly (tests needed to be updated to use a custom environment for mpas_analysis).

@forsyth2
Copy link
Collaborator

forsyth2 commented Dec 2, 2025

Here's what Claude says when I give it the bash file mpas_analysis_ts_1985-1989_climo_1985-1989.bash and log mpas_analysis_ts_1985-1989_climo_1985-1989.o1028963 in /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_legacy_3.0.0_comprehensive_v3_output/test-fix-mpas-analysis-cache/v3.LR.historical_0051/post/scripts:


The error occurs because the MPAS analysis code is trying to pickle (serialize) an lxml.etree._ElementTree object for parallel processing, but this type of object cannot be pickled in Python's multiprocessing framework.

Looking at the stack trace, the issue is:

TypeError: cannot pickle 'lxml.etree._ElementTree' object
when serializing dict item 'xmlfile'
when serializing mpas_analysis.shared.io.namelist_streams_interface.StreamsFile state

The problem happens when the code tries to start parallel tasks (line shows parallelTaskCount = 6 in the config). The StreamsFile object contains an unpicklable lxml.etree._ElementTree object that needs to be passed between processes.

Potential solutions:

  1. Set parallel task count to 1 (run serially):
    Change this line in the generated config:

    parallelTaskCount = 1
  2. This is likely a bug in the MPAS-Analysis code itself - specifically in how the StreamsFile class stores the parsed XML. The xmlfile attribute should either:

    • Not be stored as an instance variable
    • Be recreated in each subprocess rather than pickled
    • Use a custom __getstate__/__setstate__ to handle pickling
  3. Check your MPAS-Analysis version - this might be fixed in newer versions. The path shows you're using Python 3.14, which is quite recent, and there may be compatibility issues with the multiprocessing changes in newer Python versions.

Quick fix: Modify the script to set parallelTaskCount = 1 in the configuration section:

[execute]
parallelTaskCount = 1

This will make the analysis run serially, avoiding the pickling issue, though it will be slower.

@forsyth2
Copy link
Collaborator

forsyth2 commented Dec 2, 2025

whether the cache files being absent at the first call to mpas_analysis is responsible for the TypeError?

Further commentary from Claude:


No, the missing cache files are not responsible for the TypeError. These are two separate issues:

1. Missing Cache Files (Warning, not an error)

cp: cannot stat 'cache/mpasIndexOcean.nc': No such file or directory
cp: cannot stat 'cache/mpasTimeSeriesOcean.nc': No such file or directory
cp: cannot stat 'cache/seaIceAreaVolNH.nc': No such file or directory
cp: cannot stat 'cache/seaIceAreaVolSH.nc': No such file or directory

These warnings occur because this appears to be the first run of the analysis, so the cache directory doesn't have these files yet. The script handles this gracefully - it tries to copy cached files if they exist (to speed up processing), but continues normally if they don't. The files will be created during this run and cached for future runs (as shown at the end of the script with the "CACHE OUTPUT FILES" section).

2. TypeError (The actual crash)

The TypeError: cannot pickle 'lxml.etree._ElementTree' object happens much later, after:

  • The directory structure is set up successfully
  • The config file is created successfully
  • MPAS-Analysis starts running
  • It begins preprocessing and generating mapping weights
  • It attempts to start parallel tasks using multiprocessing

The error occurs when Python's multiprocessing tries to serialize the StreamsFile object to pass it to worker processes, but finds an unpicklable lxml.etree._ElementTree object inside it.

The missing cache files have no connection to the pickling error. The pickling error is a code/compatibility issue with how MPAS-Analysis handles XML parsing in parallel contexts.

@cbegeman
Copy link
Contributor Author

cbegeman commented Dec 2, 2025

@forsyth2 Thanks, I suspected they were unrelated. The pickle error may need to be resolved in the MPAS-Analysis environment but @xylar would know better.

@cbegeman
Copy link
Contributor Author

cbegeman commented Dec 2, 2025

@forsyth2 Regardless, I can push a commit to only copy those files if they exist.

@forsyth2
Copy link
Collaborator

forsyth2 commented Dec 2, 2025

The pickle error may need to be resolved in the MPAS-Analysis environment but @xylar would know better.

Yes, that what it looks like to me at least.

I can push a commit to only copy those files if they exist.

Thanks, I guess not strictly necessary but it would help clean up the logs.

@xylar
Copy link
Contributor

xylar commented Dec 2, 2025

The pickle error may need to be resolved in the MPAS-Analysis environment but @xylar would know better.

I don't think so. I don't think MPAS-Analysis should be doing any pickling.

@forsyth2
Copy link
Collaborator

forsyth2 commented Dec 2, 2025

I don't think so. I don't think MPAS-Analysis should be doing any pickling.

Then, what would be doing the pickling? Based on the following, it looks like the error must be occurring inside the mpas_analysis call.

> cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_legacy_3.0.0_comprehensive_v3_output/test-fix-mpas-analysis-cache/v3.LR.historical_0051/post/scripts/
> cat mpas_analysis_ts_1985-1989_climo_1985-1989.status 
ERROR (1)

> grep -B 2 "ERROR (1)" mpas_analysis_ts_1985-1989_climo_1985-1989.bash
mpas_analysis ${purge} --verbose ${extra_config} cfg/mpas_analysis_${identifier}.cfg
if [ $? != 0 ]; then
  echo 'ERROR (1)' > /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_legacy_3.0.0_comprehensive_v3_output/test-fix-mpas-analysis-cache/v3.LR.historical_0051/post/scripts/mpas_analysis_ts_1985-1989_climo_1985-1989.status

@xylar
Copy link
Contributor

xylar commented Dec 2, 2025

Looking at the error messages that @forsyth2 pointed to, it does look like it's happening in multiprocessing in the MPAS-Analysis run. It is not an error I have ever seen before and I don't understand where it's coming from.

However, I suspect that it is being triggered by an unsafe use of ESMF in @forsyth2's test run. When we use ESMF from conda (i.e. with any environment other than E3SM-Unified), we shouldn't be calling it with srun. So MPAS-Analysis must be configured to create mapping files in serial instead. Ideally, MPAS-Analysis shouldn't be creating mapping files at all, so it would be worth finding out why this is happening. All required mapping files for any standard E3SM grids are supposed to be available as part of the diagnostics data.

@xylar
Copy link
Contributor

xylar commented Dec 2, 2025

Then, what would be doing the pickling?

It turns out that multiprocessing is doing the pickling, though it isn't clear to me what is causing an error, what might have changes, and why my own testing didn't reveal this.

@forsyth2
Copy link
Collaborator

forsyth2 commented Dec 2, 2025

I suspect that it is being triggered by an unsafe use of ESMF in @forsyth2's test run.

@xylar I can share more of the testing setup. All the steps I ran are in the expandable section in #760 (comment). Each directory also has a provenance cfg of the form provenance.*.cfg:

/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test-fix-mpas-analysis-cache/v2.LR.historical_0201/post/scripts/provenance.20251202_184701_746975.cfg
/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_legacy_3.0.0_comprehensive_v2_output/test-fix-mpas-analysis-cache/v2.LR.historical_0201/post/scripts/provenance.20251202_184723_757735.cfg
/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-fix-mpas-analysis-cache/v3.LR.historical_0051/post/scripts/provenance.20251202_184714_219918.cfg
/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_legacy_3.0.0_comprehensive_v3_output/test-fix-mpas-analysis-cache/v3.LR.historical_0051/post/scripts/provenance.20251202_184731_828236.cfg

Maybe something isn't set up correctly in the cfgs?

@forsyth2
Copy link
Collaborator

forsyth2 commented Dec 2, 2025

When we use ESMF from conda (i.e. with any environment other than E3SM-Unified), we shouldn't be calling it with srun.

Historically, we have tested mpas_analysis in the Unified environment, which could explain why we haven't seen this error before. In fact, this is apparently the first time we're testing a dev environment for mpas_analysis, as seen by the need for the environment setup of #763.

That would mean this issue is independent of this PR and is in fact an issue with how we test mpas_analysis.

@xylar
Copy link
Contributor

xylar commented Dec 2, 2025

It looks like you all need to be setting mapMpiTasks = 1 for these runs. And it makes sense that you're getting mapping files created because these runs used a rather old EC30to60E2r2. So nothing wrong after all in that sense.

@forsyth2
Copy link
Collaborator

forsyth2 commented Dec 2, 2025

It looks like you all need to be setting mapMpiTasks = 1 for these runs.

Should that be true in general for standard zppy testing, or specifically for when we're testing a dev environment?

@xylar
Copy link
Contributor

xylar commented Dec 2, 2025

Historically, we have tested mpas_analysis in the Unified environment, which could explain why we haven't seen this error before. In fact, this is apparently the first time we're testing a dev environment for mpas_analysis, as seen by the need for the environment setup of #763.

That would mean this issue is independent of this PR and is in fact an issue with how we test mpas_analysis.

Yes, it's possible that the pickle error is a side effect of an ESMF failure that you wouldn't have seen before. My hope is that running ESMF in serial might cause it not to return an error, which I think may be triggering the pickling issue.

But it's also possible (maybe even likely) that the pickling problem is happening after the ESMF call finishes successfully. I will have to investigate further tomorrow.

@xylar
Copy link
Contributor

xylar commented Dec 4, 2025

That's all fine. The hr_mpas_analysis environment is also just a dev environment for MPAS-Analysis. I was just thinking it might save someone the headache of making a dev environment if we just combined the 2 but I'm happy to leave the zppy side of things to @wlin7 or whoever is running things.

@cbegeman
Copy link
Contributor Author

cbegeman commented Dec 4, 2025

@forsyth2 I'm on board with everything you're saying. Sorry for the confusion

@forsyth2 forsyth2 mentioned this pull request Dec 8, 2025
7 tasks
@chengzhuzhang
Copy link
Collaborator

Here I'm documenting my steps to test this PR with updated mpas-analysis task:

  1. Activate zppy_dev env based on this branch: source /home/ac.zhang40/y/etc/profile.d/conda.sh; conda activate zppy_dev
  2. First run that generate cache for 1850-1854, input directory include 1850-1854 ocn and sea-ice files:
    zppy -c post.v3.LR.historical_0051_mpas_cache_run1.cfg
[default]
input = /lcrc/group/e3sm2/ac.zhang40/E3SMv3_testing/v3.LR.historical_0051/run1  #including 1850-1854
input_subdir = archive/atm/hist
output = /lcrc/group/e3sm2/ac.zhang40/E3SMv3_testings/v3.LR.historical_0051_try6
case = v3.LR.historical_0051
www = /lcrc/group/e3sm/public_html/diagnostic_output/ac.zhang40/tests/E3SMv3_testings_mpas_cache_try6
partition = compute
environment_commands = "source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh"

[mpas_analysis]
active = True
environment_commands="source /lcrc/soft/climate/e3sm-unified/load_hr_mpas_analysis.sh"
walltime = "4:00:00"
parallelTaskCount = 6
anomalyRefYear = 1850
# First run:
ts_years = "1850-1854", 
enso_years = "1850-1854", 
climo_years = "1850-1854", 
mesh = "IcoswISC30E3r5"
shortTermArchive = True

  1. .Second run that generate 1850-1859, timeseries input directory include 1850 and 1855-1859 ocn and sea-ice files:
    zppy -c post.v3.LR.historical_0051_mpas_cache_run2.cfg
[default]
input = /lcrc/group/e3sm2/ac.zhang40/E3SMv3_testing/v3.LR.historical_0051/run2 # including 1850, and 1855-1859
input_subdir = archive/atm/hist
output = /lcrc/group/e3sm2/ac.zhang40/E3SMv3_testings/v3.LR.historical_0051_try6
case = v3.LR.historical_0051
www = /lcrc/group/e3sm/public_html/diagnostic_output/ac.zhang40/tests/E3SMv3_testings_mpas_cache_try6
partition = compute
environment_commands = "source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh"


[mpas_analysis]
active = True
environment_commands="source /lcrc/soft/climate/e3sm-unified/load_hr_mpas_analysis.sh"
walltime = "4:00:00"
parallelTaskCount = 6
anomalyRefYear = 1850
# Remove time-series history files [1851-1854] from ocn and ice, keep the first/anomalyRefYear, and conduct second run with following.
ts_years = "1850-1859", 
enso_years = "1850-1859", 
climo_years = "1855-1859", 
mesh = "IcoswISC30E3r5"
shortTermArchive = True

@chengzhuzhang chengzhuzhang requested a review from xylar December 10, 2025 20:06
Copy link
Collaborator

@chengzhuzhang chengzhuzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code change look reasonable to me, I will update after re-testing.
@xylar it would be great if you could do a code review. Thanks.

Copy link
Collaborator

@chengzhuzhang chengzhuzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wlin7 and @xylar, with the new commit, running mpas-analysis through zppy created reasonable time-series. Please review:
1.The testing setup: #760 (comment)
2.The results here
3.The output directory: /lcrc/group/e3sm2/ac.zhang40/E3SMv3_testings/v3.LR.historical_0051_try7

Please review if this PR provided expected behavior, also note implication on other dependency. One thing I checked is that the global-time-series plot would use files from cache/timeseries, as long as this directory structure remains, the downstream plotting for global-time-series won't be impacted.

However, there is one important item to note, to go with deletion for mpas history file.
Among the three global mean time-series plots: tracks changes of ocean heat content, sea-level, moc: the moc plot uses file from mpas-analysis cache, however the other two are computed from mpas history files (code see here)

Copy link
Contributor

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One line I think probably should be removed.

@chengzhuzhang, can your comments be resolved at this point?

Comment on lines +313 to +314
# Remove one particularly large file which does not need to be cached
rm ${identifier}/timeseries/mpasTimeSeriesSeaIce.nc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to just be deleting a time-series file form the analysis directory, and not to be related to caching at all. Is this perhaps a relic of an earlier approach to copying cache files and something no longer needed? I would think it wouldn't be a great idea to delete output from the mpas-analysis output directory even if it is large unless we decide on a systematic way of cleaning things up that doesn't lose important data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing large files from MPAS-Analysis that are not needed was requested by HR (specifically @wlin7). Do you think that MPAS-Analysis would be a better place to do this clean-up? I'm not particularly tied to including this in this PR but maybe some more context would be helpful:

This file seems to be somewhat of an outlier in MPAS-Analysis output files in that it has dimensions of Time (equal the length of the requested time series) and nCells (equal to the size of the whole mesh, which seems particularly egregious given that only a portion of those cells would contain sea ice). Correct me if there are other files like this. It's not clear to me that this would be "losing important data" because it's just copying information from history files that would (ideally) be archived as opposed to derived analysis variables.

ncinfo timeseries/mpasTimeSeriesSeaIce.nc
...
    dimensions(sizes): Time(72), StrLen(64), nCells(465044)
    variables(dimensions): |S1 xtime_startMonthly(Time, StrLen), |S1 xtime_endMonthly(Time, StrLen), float32 timeMonthly_avg_iceAreaCell(Time, nCells), float32 timeMonthly_avg_iceVolumeCell(Time, nCells), float32 timeMonthly_avg_snowVolumeCell(Time, nCells)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I don't think zppy should be deleting MPAS-Analysis files. This needs to be fixed in MPAS-Analysis instead. Sorry for extra work for, though!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is a corresponding file for the the ocean as far as I know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's chat about this on Slack.

Copy link
Contributor

@xylar xylar Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After having a conversation on Slack, I've come around to this approach.

I think MPAS-Analysis would ideally delete this file itself. I think MPAS-Analysis would also ideally extract only the variables that it plots directly from mpasTimeSeriesOcean.nc into other, smaller time series files that could be cached instead. But this was not how it was designed and making these changes in a robust way will likely take more time than we have before HR needs a working solution.

So I am fine with this approach as a less ideal but working alternative.

@xylar
Copy link
Contributor

xylar commented Dec 11, 2025

However, there is one important item to note, to go with deletion for mpas history file.
Among the three global mean time-series plots: tracks changes of ocean heat content, sea-level, moc: the moc plot uses file from mpas-analysis cache, however the other two are computed from mpas history files (code see here)

@chengzhuzhang, I didn't quite follow what the solution is to this problem. Do you think the MPAS-Ocean history files for ocean heat content and sea level need to be retained in order for global time series analysis to succeed? Or is the suggestion to skip those variables? Or something else? It seems like it will be a problem if history files have to be retained for those plots, and also if those plots have to be skipped.

Copy link
Contributor

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving after walking back my requested change from before.

@chengzhuzhang
Copy link
Collaborator

chengzhuzhang commented Dec 11, 2025

@chengzhuzhang, I didn't quite follow what the solution is to this problem. Do you think the MPAS-Ocean history files for ocean heat content and sea level need to be retained in order for global time series analysis to succeed? Or is the suggestion to skip those variables? Or something else? It seems like it will be a problem if history files have to be retained for those plots, and also if those plots have to be skipped.

Thank you for a review, @xylar.
Yes, with the current implementation of zppy/zppy-interface, MPAS-Ocean history files for ocean heat content and sea level need to be retained. We won't have automated global mean time-series generated without history files kept on disk. To attempt an alternative. In another zppy PR, i have a commit to save the global mean time-series nc files for ohc and volume to the output /post directory. With this change it is possible to do offline calculation of the whole time-period after history files are deleted. Folks who are running the data management and analysis need to be aware @wlin7 .

@forsyth2
Copy link
Collaborator

I don't think zppy should be deleting MPAS-Analysis files. This needs to be fixed in MPAS-Analysis instead.

this was not how it was designed and making these changes in a robust way will likely take more time than we have before HR needs a working solution.
So I am fine with this approach as a less ideal but working alternative.

@xylar Thanks for explaining the reasoning for the design decisions.

@cbegeman Thank you for your work on this PR!

@chengzhuzhang Do we just want a final review from @wlin7 before merging this? (I'll also want to merge the testing update #763).

@chengzhuzhang
Copy link
Collaborator

@chengzhuzhang Do we just want a final review from @wlin7 before merging this? (I'll also want to merge the testing update #763).

Yes.

@wlin7
Copy link

wlin7 commented Dec 17, 2025

Thank you @cbegeman @xylar @chengzhuzhang for getting this enhanced caching mechanism to work. I did a verification with a regular run of mpas-analysis for ts_1850-1859_climo_1855-1859 vs. this one with caching IICE comparison confirms the diagnostics are identical.

@cbegeman
Copy link
Contributor Author

@wlin7 and @chengzhuzhang Thank you both for your review and testing! I think this is good to merge when @wlin7 approves.

@chengzhuzhang
Copy link
Collaborator

Thank you @wlin7 for reviewing and confirming!

@chengzhuzhang chengzhuzhang merged commit 55fd58b into E3SM-Project:main Dec 17, 2025
7 checks passed
@forsyth2
Copy link
Collaborator

Thanks @cbegeman for working on this. Thanks @chengzhuzhang, @xylar, @wlin7 for reviewing/testing. I also just merged the associated test update (#763)

@cbegeman
Copy link
Contributor Author

Great. Thanks, @forsyth2!

@forsyth2
Copy link
Collaborator

@xylar For zppy's weekly main branch testing, now that this PR & MPAS-Dev/MPAS-Analysis#1126 have been merged, should I be setting

environment_commands=source /lcrc/soft/climate/e3sm-unified/load_hr_mpas_analysis.sh

for the mpas_analysis task? (That is, the Unified mpas_analysis would no longer work?)

@xylar
Copy link
Contributor

xylar commented Dec 19, 2025

@forsyth2 , I don't think so. I think the HR branch is unofficial and shouldn't become the new default for testing.

1 similar comment
@xylar
Copy link
Contributor

xylar commented Dec 19, 2025

@forsyth2 , I don't think so. I think the HR branch is unofficial and shouldn't become the new default for testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants