Add additional files to the mpas-analysis cache for time series analysis #760

cbegeman · 2025-11-25T19:12:18Z

Summary

Objectives:

Cache additional MPAS-Analysis files needed to extend time series plots from existing analysis when the source files are not present in the output path (typically because they have been long-term archived)
Remove an MPAS-Analysis file that may be large (with nCells dimensions) when MPAS-Analysis has been completed which is not needed to extend time series plots

Select one: This pull request is...

a bug fix: increment the patch version
a small improvement: increment the minor version
a new feature: increment the minor version
an incompatible (non-backwards compatible) API change: increment the major version

Small Change

To merge, I will use "Squash and merge". That is, this change should be a single commit.
Logic: I have visually inspected the entire pull request myself.
Pre-commit checks: All the pre-commits checks have passed.

cbegeman · 2025-12-01T19:42:03Z

zppy/templates/mpas_analysis.bash

+for file in "${files[@]}"
+do
+  cp ${identifier}/timeseries/${file} cache/timeseries/${file}
+done


@jonbob This is the set of lines that will need to be run once manually to cache the files that weren't cached with the released version of MPAS-Analysis

Note: defined above files=( "mpasIndexOcean.nc" "mpasTimeSeriesOcean.nc" "seaIceAreaVolNH.nc" "seaIceAreaVolSH.nc")

@cbegeman I think this line might be the trouble that caused cp: cannot stat 'cache/mpasIndexOcean.nc': No such file or directory issues in my second run test. The Cache output files are saved in cache/timeseries, however when the second run is reading files from cache, it tried to find files from cache

chengzhuzhang · 2025-12-01T21:17:42Z

@cbegeman thank you for working on this update. I looks like the mpas-analysis update has been merged (MPAS-Dev/MPAS-Analysis#1126) and @xylar has an activation script to use this latest version. To be able to test this with the HR spin-up run through zppy, we will need to finalize this branch as well. Is this branch ready for review?

xylar · 2025-12-01T21:18:55Z

and @xylar has an activation script to use this latest version

Not quite, but I will tomorrow...

chengzhuzhang · 2025-12-01T21:26:44Z

and @xylar has an activation script to use this latest version

Not quite, but I will tomorrow...

No hurry, thank you @xylar.

cbegeman · 2025-12-01T22:52:34Z

@chengzhuzhang and @xylar Thanks for your help! Yes, it is ready for review.

forsyth2 · 2025-12-01T22:56:26Z

@xylar Let me know once you have the activation commands and then I can run a subset of the zppy test suite on the latest mpas_analysis specifically.

xylar · 2025-12-02T08:42:22Z

Okay, the environment and activation script are in place:

source /lcrc/soft/climate/e3sm-unified/load_hr_mpas_analysis.sh

chengzhuzhang · 2025-12-02T18:52:09Z

Thank you @cbegeman and @xylar.
I'm look for test case that can mimic the use case expected in ongoing v3HR spining up run. @wlin7 or @cbegeman do you have a good test in mind?

forsyth2 · 2025-12-02T19:25:15Z

Testing steps -- `mpas_analysis` subset

cd ~/ez/zppy
git status
# On branch add-legacy-cfgs
# nothing to commit, working tree clean
git remote add cbegeman git@github.com:cbegeman/zppy.git
git remote -v
# cbegeman	git@github.com:cbegeman/zppy.git (fetch)
# cbegeman	git@github.com:cbegeman/zppy.git (push)

git fetch cbegeman fix-mpas-analysis-cache
git checkout -b fix-mpas-analysis-cache cbegeman/fix-mpas-analysis-cache
git log
# Good, last commits:
# Add additional files to the mpas-analysis cache for time series analysis
# Bump to 3.1.0 (#759)

# Edit tests/integration/utils.py
# TEST_SPECIFICS:
# TEST_SPECIFICS: Dict[str, Any] = {
#     # These are custom environment_commands for specific tasks.
#     # Never set these to "", because they will print the line
#     # `environment_commands = ""` for the corresponding task,
#     # thus overriding the value set higher up in the cfg.
#     # That is, there will be no environment set.
#     # (`environment_commands = ""` only redirects to Unified
#     # if specified under the [default] task)
#     "diags_environment_commands": "source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh",
#     "global_time_series_environment_commands": "source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh",
#     "pcmdi_diags_environment_commands": "source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh",
#     # This is the environment setup for other tasks.
#     # Leave as "" to use the latest Unified environment.
#     "environment_commands": "source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh",
#     "cfgs_to_run": [
#         "weekly_bundles",
#         "weekly_comprehensive_v2",
#         "weekly_comprehensive_v3",
#         "weekly_legacy_3.0.0_bundles",
#         "weekly_legacy_3.0.0_comprehensive_v2",
#         "weekly_legacy_3.0.0_comprehensive_v3",
#     ],
#     "tasks_to_run": [
#         # "e3sm_diags",
#         "mpas_analysis",
#         # "global_time_series",
#         # "ilamb",
#         # "pcmdi_diags",
#     ],
#     "unique_id": "test-fix-mpas-analysis-cache",
# }

git diff
# Diff looks good

lcrc_conda # Activate conda
rm -rf build
conda clean --all --y
conda env create -f conda/dev.yml -n test-fix-mpas-analysis-cache
conda activate test-fix-mpas-analysis-cache
pre-commit run --all-files
python -m pip install .

pytest tests/test_*.py
# 35 passed in 0.40s
python tests/integration/utils.py
# Everything is using Unified.
# We still need to add the mpas-analysis specific environment!

git add -A
git commit -m "Changes for testing"

# Add mpas_analysis_environment_commands
git diff
# Diff looks good
python tests/integration/utils.py
# Good, now we have:
# mpas_analysis_environment_commands=source /lcrc/soft/climate/e3sm-unified/load_hr_mpas_analysis.sh
pre-commit run --all-files # Iterate until passing
git add -A
python tests/integration/utils.py # Rerun
git status # Confirm there were no changes based on the reun
git commit -m "Add mpas_analysis_environment_commands"

# Now, we have one more testing change to make.
# The bundles test cfgs don't test for mpas_analysis, so we can remove them:
# "cfgs_to_run": [
#     # "weekly_bundles",
#     "weekly_comprehensive_v2",
#     "weekly_comprehensive_v3",
#     # "weekly_legacy_3.0.0_bundles",
#     "weekly_legacy_3.0.0_comprehensive_v2",
#     "weekly_legacy_3.0.0_comprehensive_v3",
# ],
python tests/integration/utils.py
git diff
# Diff looks good

zppy -c tests/integration/generated/test_weekly_comprehensive_v2_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_comprehensive_v3_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_legacy_3.0.0_comprehensive_v2_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_legacy_3.0.0_comprehensive_v3_chrysalis.cfg
sq | wc -l # Tue 12/02 10:47 => 82 - header row = 81 jobs
# For reference:
alias sq
# alias sq='sqa -u ac.forsyth2'
alias sqa
# alias sqa='squeue -o "%8u %.7a %.4D %.9P %7i %.2t %.10r %.10M %.10l %j" --sort=P,-t,-p'

sq | wc -l # Tue 12/02 11:17 => 12 - header row = 11 jobs
sq
# All jobs have state: pending for reason: dependency.
# So, nothing more is going to run.

# Review finished runs
### v2  ###
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test-fix-mpas-analysis-cache/v2.LR.historical_0201/post/scripts
grep -v "OK" *status
# global_time_series_1980-1990.status:WAITING 1028901
# mpas_analysis_ts_1980-1984_climo_1980-1984.status:ERROR (1)
# mpas_analysis_ts_1980-1990_climo_1985-1990.status:WAITING 1028900
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_legacy_3.0.0_comprehensive_v2_output/test-fix-mpas-analysis-cache/v2.LR.historical_0201/post/scripts
grep -v "OK" *status
# global_time_series_1980-1990.status:WAITING 1028948
# mpas_analysis_ts_1980-1984_climo_1980-1984.status:ERROR (1)
# mpas_analysis_ts_1980-1990_climo_1985-1990.status:WAITING 1028947
### v3 ###
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-fix-mpas-analysis-cache/v3.LR.historical_0051/post/scripts
grep -v "OK" *status
# global_time_series_classic_pdf_both_1985-1995.status:WAITING 1028927
# global_time_series_classic_pdf_original_1985-1995.status:WAITING 1028928
# global_time_series_viewer_both_1985-1995.status:WAITING 1028924
# global_time_series_viewer_original_1985-1995.status:WAITING 1028925
# mpas_analysis_ts_1985-1989_climo_1985-1989.status:ERROR (1)
# mpas_analysis_ts_1985-1995_climo_1990-1995.status:WAITING 1028923
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_legacy_3.0.0_comprehensive_v3_output/test-fix-mpas-analysis-cache/v3.LR.historical_0051/post/scripts
grep -v "OK" *status
# global_time_series_1985-1995.status:WAITING 1028965
# mpas_analysis_ts_1985-1989_climo_1985-1989.status:ERROR (1)
# mpas_analysis_ts_1985-1995_climo_1990-1995.status:WAITING 1028964

@cbegeman, @xylar -- I encountered some errors while testing. Output paths below:

# Let's review the errors
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test-fix-mpas-analysis-cache/v2.LR.historical_0201/post/scripts
emacs mpas_analysis_ts_1980-1984_climo_1980-1984.o1028899
# TypeError: cannot pickle 'lxml.etree._ElementTree' object
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_legacy_3.0.0_comprehensive_v2_output/test-fix-mpas-analysis-cache/v2.LR.historical_0201/post/scripts
emacs mpas_analysis_ts_1980-1984_climo_1980-1984.o1028946
# TypeError: cannot pickle 'lxml.etree._ElementTree' object
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-fix-mpas-analysis-cache/v3.LR.historical_0051/post/scripts
emacs mpas_analysis_ts_1985-1989_climo_1985-1989.o1028922 
# TypeError: cannot pickle 'lxml.etree._ElementTree' object
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_legacy_3.0.0_comprehensive_v3_output/test-fix-mpas-analysis-cache/v3.LR.historical_0051/post/scripts
emacs mpas_analysis_ts_1985-1989_climo_1985-1989.o1028963
# TypeError: cannot pickle 'lxml.etree._ElementTree' object

Common theme seems to be:

cp: cannot stat 'cache/mpasIndexOcean.nc': No such file or directory
cp: cannot stat 'cache/mpasTimeSeriesOcean.nc': No such file or directory
cp: cannot stat 'cache/seaIceAreaVolNH.nc': No such file or directory
cp: cannot stat 'cache/seaIceAreaVolSH.nc': No such file or directory
...
# TypeError: cannot pickle 'lxml.etree._ElementTree' object

cbegeman · 2025-12-02T19:31:55Z

@forsyth2 Thanks for testing!

Is it clear to any of you whether the cache files being absent at the first call to mpas_analysis is responsible for the TypeError? From the mpas_analysis log it looks like the script moves past the cp calls.

forsyth2 · 2025-12-02T19:42:59Z

Note: I'll be merging #763 to update the test suite accordingly (tests needed to be updated to use a custom environment for mpas_analysis).

forsyth2 · 2025-12-02T19:49:53Z

Here's what Claude says when I give it the bash file mpas_analysis_ts_1985-1989_climo_1985-1989.bash and log mpas_analysis_ts_1985-1989_climo_1985-1989.o1028963 in /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_legacy_3.0.0_comprehensive_v3_output/test-fix-mpas-analysis-cache/v3.LR.historical_0051/post/scripts:

The error occurs because the MPAS analysis code is trying to pickle (serialize) an lxml.etree._ElementTree object for parallel processing, but this type of object cannot be pickled in Python's multiprocessing framework.

Looking at the stack trace, the issue is:

TypeError: cannot pickle 'lxml.etree._ElementTree' object
when serializing dict item 'xmlfile'
when serializing mpas_analysis.shared.io.namelist_streams_interface.StreamsFile state

The problem happens when the code tries to start parallel tasks (line shows parallelTaskCount = 6 in the config). The StreamsFile object contains an unpicklable lxml.etree._ElementTree object that needs to be passed between processes.

Potential solutions:

Set parallel task count to 1 (run serially):
Change this line in the generated config:
```
parallelTaskCount = 1
```
This is likely a bug in the MPAS-Analysis code itself - specifically in how the StreamsFile class stores the parsed XML. The xmlfile attribute should either:
- Not be stored as an instance variable
- Be recreated in each subprocess rather than pickled
- Use a custom __getstate__/__setstate__ to handle pickling
Check your MPAS-Analysis version - this might be fixed in newer versions. The path shows you're using Python 3.14, which is quite recent, and there may be compatibility issues with the multiprocessing changes in newer Python versions.

Quick fix: Modify the script to set parallelTaskCount = 1 in the configuration section:

[execute]
parallelTaskCount = 1

This will make the analysis run serially, avoiding the pickling issue, though it will be slower.

forsyth2 · 2025-12-02T19:51:57Z

whether the cache files being absent at the first call to mpas_analysis is responsible for the TypeError?

Further commentary from Claude:

No, the missing cache files are not responsible for the TypeError. These are two separate issues:

1. Missing Cache Files (Warning, not an error)

cp: cannot stat 'cache/mpasIndexOcean.nc': No such file or directory
cp: cannot stat 'cache/mpasTimeSeriesOcean.nc': No such file or directory
cp: cannot stat 'cache/seaIceAreaVolNH.nc': No such file or directory
cp: cannot stat 'cache/seaIceAreaVolSH.nc': No such file or directory

These warnings occur because this appears to be the first run of the analysis, so the cache directory doesn't have these files yet. The script handles this gracefully - it tries to copy cached files if they exist (to speed up processing), but continues normally if they don't. The files will be created during this run and cached for future runs (as shown at the end of the script with the "CACHE OUTPUT FILES" section).

2. TypeError (The actual crash)

The TypeError: cannot pickle 'lxml.etree._ElementTree' object happens much later, after:

The directory structure is set up successfully
The config file is created successfully
MPAS-Analysis starts running
It begins preprocessing and generating mapping weights
It attempts to start parallel tasks using multiprocessing

The error occurs when Python's multiprocessing tries to serialize the StreamsFile object to pass it to worker processes, but finds an unpicklable lxml.etree._ElementTree object inside it.

The missing cache files have no connection to the pickling error. The pickling error is a code/compatibility issue with how MPAS-Analysis handles XML parsing in parallel contexts.

cbegeman · 2025-12-02T21:28:31Z

@forsyth2 Thanks, I suspected they were unrelated. The pickle error may need to be resolved in the MPAS-Analysis environment but @xylar would know better.

cbegeman · 2025-12-02T21:29:13Z

@forsyth2 Regardless, I can push a commit to only copy those files if they exist.

forsyth2 · 2025-12-02T21:32:27Z

The pickle error may need to be resolved in the MPAS-Analysis environment but @xylar would know better.

Yes, that what it looks like to me at least.

I can push a commit to only copy those files if they exist.

Thanks, I guess not strictly necessary but it would help clean up the logs.

xylar · 2025-12-02T21:33:25Z

The pickle error may need to be resolved in the MPAS-Analysis environment but @xylar would know better.

I don't think so. I don't think MPAS-Analysis should be doing any pickling.

forsyth2 · 2025-12-02T21:42:15Z

I don't think so. I don't think MPAS-Analysis should be doing any pickling.

Then, what would be doing the pickling? Based on the following, it looks like the error must be occurring inside the mpas_analysis call.

> cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_legacy_3.0.0_comprehensive_v3_output/test-fix-mpas-analysis-cache/v3.LR.historical_0051/post/scripts/
> cat mpas_analysis_ts_1985-1989_climo_1985-1989.status 
ERROR (1)

> grep -B 2 "ERROR (1)" mpas_analysis_ts_1985-1989_climo_1985-1989.bash
mpas_analysis ${purge} --verbose ${extra_config} cfg/mpas_analysis_${identifier}.cfg
if [ $? != 0 ]; then
  echo 'ERROR (1)' > /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_legacy_3.0.0_comprehensive_v3_output/test-fix-mpas-analysis-cache/v3.LR.historical_0051/post/scripts/mpas_analysis_ts_1985-1989_climo_1985-1989.status

xylar · 2025-12-02T21:42:56Z

Looking at the error messages that @forsyth2 pointed to, it does look like it's happening in multiprocessing in the MPAS-Analysis run. It is not an error I have ever seen before and I don't understand where it's coming from.

However, I suspect that it is being triggered by an unsafe use of ESMF in @forsyth2's test run. When we use ESMF from conda (i.e. with any environment other than E3SM-Unified), we shouldn't be calling it with srun. So MPAS-Analysis must be configured to create mapping files in serial instead. Ideally, MPAS-Analysis shouldn't be creating mapping files at all, so it would be worth finding out why this is happening. All required mapping files for any standard E3SM grids are supposed to be available as part of the diagnostics data.

xylar · 2025-12-02T21:44:03Z

Then, what would be doing the pickling?

It turns out that multiprocessing is doing the pickling, though it isn't clear to me what is causing an error, what might have changes, and why my own testing didn't reveal this.

forsyth2 · 2025-12-02T21:50:20Z

I suspect that it is being triggered by an unsafe use of ESMF in @forsyth2's test run.

@xylar I can share more of the testing setup. All the steps I ran are in the expandable section in #760 (comment). Each directory also has a provenance cfg of the form provenance.*.cfg:

/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test-fix-mpas-analysis-cache/v2.LR.historical_0201/post/scripts/provenance.20251202_184701_746975.cfg
/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_legacy_3.0.0_comprehensive_v2_output/test-fix-mpas-analysis-cache/v2.LR.historical_0201/post/scripts/provenance.20251202_184723_757735.cfg
/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-fix-mpas-analysis-cache/v3.LR.historical_0051/post/scripts/provenance.20251202_184714_219918.cfg
/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_legacy_3.0.0_comprehensive_v3_output/test-fix-mpas-analysis-cache/v3.LR.historical_0051/post/scripts/provenance.20251202_184731_828236.cfg

Maybe something isn't set up correctly in the cfgs?

forsyth2 · 2025-12-02T21:53:06Z

When we use ESMF from conda (i.e. with any environment other than E3SM-Unified), we shouldn't be calling it with srun.

Historically, we have tested mpas_analysis in the Unified environment, which could explain why we haven't seen this error before. In fact, this is apparently the first time we're testing a dev environment for mpas_analysis, as seen by the need for the environment setup of #763.

That would mean this issue is independent of this PR and is in fact an issue with how we test mpas_analysis.

xylar · 2025-12-02T21:54:49Z

It looks like you all need to be setting mapMpiTasks = 1 for these runs. And it makes sense that you're getting mapping files created because these runs used a rather old EC30to60E2r2. So nothing wrong after all in that sense.

forsyth2 · 2025-12-02T21:57:43Z

It looks like you all need to be setting mapMpiTasks = 1 for these runs.

Should that be true in general for standard zppy testing, or specifically for when we're testing a dev environment?

xylar · 2025-12-02T21:58:19Z

Historically, we have tested mpas_analysis in the Unified environment, which could explain why we haven't seen this error before. In fact, this is apparently the first time we're testing a dev environment for mpas_analysis, as seen by the need for the environment setup of #763.

That would mean this issue is independent of this PR and is in fact an issue with how we test mpas_analysis.

Yes, it's possible that the pickle error is a side effect of an ESMF failure that you wouldn't have seen before. My hope is that running ESMF in serial might cause it not to return an error, which I think may be triggering the pickling issue.

But it's also possible (maybe even likely) that the pickling problem is happening after the ESMF call finishes successfully. I will have to investigate further tomorrow.

xylar · 2025-12-04T18:49:19Z

That's all fine. The hr_mpas_analysis environment is also just a dev environment for MPAS-Analysis. I was just thinking it might save someone the headache of making a dev environment if we just combined the 2 but I'm happy to leave the zppy side of things to @wlin7 or whoever is running things.

cbegeman · 2025-12-04T20:47:52Z

@forsyth2 I'm on board with everything you're saying. Sorry for the confusion

zppy/templates/mpas_analysis.bash

chengzhuzhang · 2025-12-10T01:52:05Z

Here I'm documenting my steps to test this PR with updated mpas-analysis task:

Activate zppy_dev env based on this branch: source /home/ac.zhang40/y/etc/profile.d/conda.sh; conda activate zppy_dev
First run that generate cache for 1850-1854, input directory include 1850-1854 ocn and sea-ice files:
zppy -c post.v3.LR.historical_0051_mpas_cache_run1.cfg

[default]
input = /lcrc/group/e3sm2/ac.zhang40/E3SMv3_testing/v3.LR.historical_0051/run1  #including 1850-1854
input_subdir = archive/atm/hist
output = /lcrc/group/e3sm2/ac.zhang40/E3SMv3_testings/v3.LR.historical_0051_try6
case = v3.LR.historical_0051
www = /lcrc/group/e3sm/public_html/diagnostic_output/ac.zhang40/tests/E3SMv3_testings_mpas_cache_try6
partition = compute
environment_commands = "source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh"

[mpas_analysis]
active = True
environment_commands="source /lcrc/soft/climate/e3sm-unified/load_hr_mpas_analysis.sh"
walltime = "4:00:00"
parallelTaskCount = 6
anomalyRefYear = 1850
# First run:
ts_years = "1850-1854", 
enso_years = "1850-1854", 
climo_years = "1850-1854", 
mesh = "IcoswISC30E3r5"
shortTermArchive = True

.Second run that generate 1850-1859, timeseries input directory include 1850 and 1855-1859 ocn and sea-ice files:
zppy -c post.v3.LR.historical_0051_mpas_cache_run2.cfg

[default]
input = /lcrc/group/e3sm2/ac.zhang40/E3SMv3_testing/v3.LR.historical_0051/run2 # including 1850, and 1855-1859
input_subdir = archive/atm/hist
output = /lcrc/group/e3sm2/ac.zhang40/E3SMv3_testings/v3.LR.historical_0051_try6
case = v3.LR.historical_0051
www = /lcrc/group/e3sm/public_html/diagnostic_output/ac.zhang40/tests/E3SMv3_testings_mpas_cache_try6
partition = compute
environment_commands = "source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh"


[mpas_analysis]
active = True
environment_commands="source /lcrc/soft/climate/e3sm-unified/load_hr_mpas_analysis.sh"
walltime = "4:00:00"
parallelTaskCount = 6
anomalyRefYear = 1850
# Remove time-series history files [1851-1854] from ocn and ice, keep the first/anomalyRefYear, and conduct second run with following.
ts_years = "1850-1859", 
enso_years = "1850-1859", 
climo_years = "1855-1859", 
mesh = "IcoswISC30E3r5"
shortTermArchive = True

zppy/templates/mpas_analysis.bash

chengzhuzhang

The code change look reasonable to me, I will update after re-testing.
@xylar it would be great if you could do a code review. Thanks.

chengzhuzhang

@wlin7 and @xylar, with the new commit, running mpas-analysis through zppy created reasonable time-series. Please review:
1.The testing setup: #760 (comment)
2.The results here
3.The output directory: /lcrc/group/e3sm2/ac.zhang40/E3SMv3_testings/v3.LR.historical_0051_try7

Please review if this PR provided expected behavior, also note implication on other dependency. One thing I checked is that the global-time-series plot would use files from cache/timeseries, as long as this directory structure remains, the downstream plotting for global-time-series won't be impacted.

However, there is one important item to note, to go with deletion for mpas history file.
Among the three global mean time-series plots: tracks changes of ocean heat content, sea-level, moc: the moc plot uses file from mpas-analysis cache, however the other two are computed from mpas history files (code see here)

xylar

One line I think probably should be removed.

@chengzhuzhang, can your comments be resolved at this point?

xylar · 2025-12-11T10:00:20Z

zppy/templates/mpas_analysis.bash

+# Remove one particularly large file which does not need to be cached
+rm ${identifier}/timeseries/mpasTimeSeriesSeaIce.nc


This seems to just be deleting a time-series file form the analysis directory, and not to be related to caching at all. Is this perhaps a relic of an earlier approach to copying cache files and something no longer needed? I would think it wouldn't be a great idea to delete output from the mpas-analysis output directory even if it is large unless we decide on a systematic way of cleaning things up that doesn't lose important data.

Removing large files from MPAS-Analysis that are not needed was requested by HR (specifically @wlin7). Do you think that MPAS-Analysis would be a better place to do this clean-up? I'm not particularly tied to including this in this PR but maybe some more context would be helpful:

This file seems to be somewhat of an outlier in MPAS-Analysis output files in that it has dimensions of Time (equal the length of the requested time series) and nCells (equal to the size of the whole mesh, which seems particularly egregious given that only a portion of those cells would contain sea ice). Correct me if there are other files like this. It's not clear to me that this would be "losing important data" because it's just copying information from history files that would (ideally) be archived as opposed to derived analysis variables.

ncinfo timeseries/mpasTimeSeriesSeaIce.nc ... dimensions(sizes): Time(72), StrLen(64), nCells(465044) variables(dimensions): |S1 xtime_startMonthly(Time, StrLen), |S1 xtime_endMonthly(Time, StrLen), float32 timeMonthly_avg_iceAreaCell(Time, nCells), float32 timeMonthly_avg_iceVolumeCell(Time, nCells), float32 timeMonthly_avg_snowVolumeCell(Time, nCells)

Yes, I don't think zppy should be deleting MPAS-Analysis files. This needs to be fixed in MPAS-Analysis instead. Sorry for extra work for, though!

Yes, there is a corresponding file for the the ocean as far as I know.

Let's chat about this on Slack.

After having a conversation on Slack, I've come around to this approach.

I think MPAS-Analysis would ideally delete this file itself. I think MPAS-Analysis would also ideally extract only the variables that it plots directly from mpasTimeSeriesOcean.nc into other, smaller time series files that could be cached instead. But this was not how it was designed and making these changes in a robust way will likely take more time than we have before HR needs a working solution.

So I am fine with this approach as a less ideal but working alternative.

xylar · 2025-12-11T10:04:18Z

However, there is one important item to note, to go with deletion for mpas history file.
Among the three global mean time-series plots: tracks changes of ocean heat content, sea-level, moc: the moc plot uses file from mpas-analysis cache, however the other two are computed from mpas history files (code see here)

@chengzhuzhang, I didn't quite follow what the solution is to this problem. Do you think the MPAS-Ocean history files for ocean heat content and sea level need to be retained in order for global time series analysis to succeed? Or is the suggestion to skip those variables? Or something else? It seems like it will be a problem if history files have to be retained for those plots, and also if those plots have to be skipped.

xylar

I'm approving after walking back my requested change from before.

chengzhuzhang · 2025-12-11T17:55:15Z

@chengzhuzhang, I didn't quite follow what the solution is to this problem. Do you think the MPAS-Ocean history files for ocean heat content and sea level need to be retained in order for global time series analysis to succeed? Or is the suggestion to skip those variables? Or something else? It seems like it will be a problem if history files have to be retained for those plots, and also if those plots have to be skipped.

Thank you for a review, @xylar.
Yes, with the current implementation of zppy/zppy-interface, MPAS-Ocean history files for ocean heat content and sea level need to be retained. We won't have automated global mean time-series generated without history files kept on disk. To attempt an alternative. In another zppy PR, i have a commit to save the global mean time-series nc files for ohc and volume to the output /post directory. With this change it is possible to do offline calculation of the whole time-period after history files are deleted. Folks who are running the data management and analysis need to be aware @wlin7 .

forsyth2 · 2025-12-15T21:03:28Z

I don't think zppy should be deleting MPAS-Analysis files. This needs to be fixed in MPAS-Analysis instead.

this was not how it was designed and making these changes in a robust way will likely take more time than we have before HR needs a working solution.
So I am fine with this approach as a less ideal but working alternative.

@xylar Thanks for explaining the reasoning for the design decisions.

@cbegeman Thank you for your work on this PR!

@chengzhuzhang Do we just want a final review from @wlin7 before merging this? (I'll also want to merge the testing update #763).

chengzhuzhang · 2025-12-15T21:08:19Z

@chengzhuzhang Do we just want a final review from @wlin7 before merging this? (I'll also want to merge the testing update #763).

Yes.

wlin7 · 2025-12-17T17:32:05Z

Thank you @cbegeman @xylar @chengzhuzhang for getting this enhanced caching mechanism to work. I did a verification with a regular run of mpas-analysis for ts_1850-1859_climo_1855-1859 vs. this one with caching IICE comparison confirms the diagnostics are identical.

cbegeman · 2025-12-17T17:34:42Z

@wlin7 and @chengzhuzhang Thank you both for your review and testing! I think this is good to merge when @wlin7 approves.

chengzhuzhang · 2025-12-17T19:31:54Z

Thank you @wlin7 for reviewing and confirming!

forsyth2 · 2025-12-17T19:54:35Z

Thanks @cbegeman for working on this. Thanks @chengzhuzhang, @xylar, @wlin7 for reviewing/testing. I also just merged the associated test update (#763)

cbegeman · 2025-12-17T19:56:00Z

Great. Thanks, @forsyth2!

forsyth2 · 2025-12-19T17:49:07Z

@xylar For zppy's weekly main branch testing, now that this PR & MPAS-Dev/MPAS-Analysis#1126 have been merged, should I be setting

environment_commands=source /lcrc/soft/climate/e3sm-unified/load_hr_mpas_analysis.sh

for the mpas_analysis task? (That is, the Unified mpas_analysis would no longer work?)

xylar · 2025-12-19T20:18:44Z

@forsyth2 , I don't think so. I think the HR branch is unofficial and shouldn't become the new default for testing.

xylar · 2025-12-19T20:18:44Z

@forsyth2 , I don't think so. I think the HR branch is unofficial and shouldn't become the new default for testing.

Add additional files to the mpas-analysis cache for time series analysis

2084313

cbegeman force-pushed the fix-mpas-analysis-cache branch from 96b3cfe to 2084313 Compare November 25, 2025 21:50

cbegeman mentioned this pull request Dec 1, 2025

Allow time series tasks to run when history files are not present but cached files are MPAS-Dev/MPAS-Analysis#1126

Merged

cbegeman commented Dec 1, 2025

View reviewed changes

cbegeman marked this pull request as ready for review December 1, 2025 22:51

forsyth2 mentioned this pull request Dec 2, 2025

Allow tests to use different mpas_analysis environment #763

Merged

7 tasks

cbegeman mentioned this pull request Dec 4, 2025

Check for transport output before loading input files MPAS-Dev/MPAS-Analysis#1128

Merged

forsyth2 mentioned this pull request Dec 5, 2025

[Feature]: Enable mpas_analysis jobs to rerun #764

Open

cbegeman commented Dec 5, 2025

View reviewed changes

zppy/templates/mpas_analysis.bash Outdated Show resolved Hide resolved

Fix filepath for new cache files

0860a1f

forsyth2 mentioned this pull request Dec 8, 2025

Add links to image diff grid rows #767

Merged

7 tasks

Add to cache after subdirs are created

c8b131a

chengzhuzhang requested a review from xylar December 10, 2025 20:06

chengzhuzhang reviewed Dec 10, 2025

View reviewed changes

zppy/templates/mpas_analysis.bash Show resolved Hide resolved

chengzhuzhang reviewed Dec 10, 2025

View reviewed changes

chengzhuzhang approved these changes Dec 10, 2025

View reviewed changes

xylar suggested changes Dec 11, 2025

View reviewed changes

xylar approved these changes Dec 11, 2025

View reviewed changes

wlin7 approved these changes Dec 17, 2025

View reviewed changes

chengzhuzhang merged commit 55fd58b into E3SM-Project:main Dec 17, 2025
7 checks passed

forsyth2 mentioned this pull request Dec 17, 2025

Enable land only ilamb and global time-series (for J case spin up) #765

Merged

7 tasks

		# Remove one particularly large file which does not need to be cached
		rm ${identifier}/timeseries/mpasTimeSeriesSeaIce.nc

Add additional files to the mpas-analysis cache for time series analysis #760

Add additional files to the mpas-analysis cache for time series analysis #760

Uh oh!

Conversation

cbegeman commented Nov 25, 2025

Summary

Small Change

Uh oh!

cbegeman Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

chengzhuzhang Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

chengzhuzhang commented Dec 1, 2025

Uh oh!

xylar commented Dec 1, 2025

Uh oh!

chengzhuzhang commented Dec 1, 2025

Uh oh!

cbegeman commented Dec 1, 2025

Uh oh!

forsyth2 commented Dec 1, 2025

Uh oh!

xylar commented Dec 2, 2025

Uh oh!

chengzhuzhang commented Dec 2, 2025

Uh oh!

forsyth2 commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cbegeman commented Dec 2, 2025

Uh oh!

forsyth2 commented Dec 2, 2025

Uh oh!

forsyth2 commented Dec 2, 2025

Uh oh!

forsyth2 commented Dec 2, 2025

1. Missing Cache Files (Warning, not an error)

2. TypeError (The actual crash)

Uh oh!

cbegeman commented Dec 2, 2025

Uh oh!

cbegeman commented Dec 2, 2025

Uh oh!

forsyth2 commented Dec 2, 2025

Uh oh!

xylar commented Dec 2, 2025

Uh oh!

forsyth2 commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xylar commented Dec 2, 2025

Uh oh!

xylar commented Dec 2, 2025

Uh oh!

forsyth2 commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

forsyth2 commented Dec 2, 2025

Uh oh!

xylar commented Dec 2, 2025

Uh oh!

forsyth2 commented Dec 2, 2025

Uh oh!

xylar commented Dec 2, 2025

Uh oh!

xylar commented Dec 4, 2025

Uh oh!

cbegeman commented Dec 4, 2025

Uh oh!

Uh oh!

chengzhuzhang commented Dec 10, 2025

Uh oh!

Uh oh!

chengzhuzhang left a comment

Choose a reason for hiding this comment

Uh oh!

chengzhuzhang left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

forsyth2 commented Dec 2, 2025 •

edited

Loading

forsyth2 commented Dec 2, 2025 •

edited

Loading

forsyth2 commented Dec 2, 2025 •

edited

Loading

chengzhuzhang left a comment •

edited

Loading

xylar Dec 11, 2025 •

edited

Loading

chengzhuzhang commented Dec 11, 2025 •

edited

Loading

wlin7 commented Dec 17, 2025 •

edited

Loading