Skip to content

gdasgldas task fails with Restart Tile Space Mismatch #622

@BrettHoover-NOAA

Description

@BrettHoover-NOAA

Expected behavior
gdasgldas task should complete successfully and global-workflow should continue to cycle fv3gdas

Current behavior
gdasgldas task is failing on the first cycle in which the task is not skipped (first 00z analysis period when enough data has been produced to trigger task).

Machines affected
This error is being expressed on Orion.

To Reproduce
I am seeing this bug in a test of global-workflow being conducted on Orion in the following directories:
expid: /work/noaa/da/bhoover/para/bth_test
code: /work/noaa/da/bhoover/global-workflow
ROTDIR: /work/noaa/stmp/bhoover/ROTDIRS/bth_test
RUNDIR: /work/noaa/stmp/bhoover/RUNDIRS/bth_test

This run is initialized on 2020082200, and designed to terminate 2 weeks later on 2020090500.

Experiment setup:
/work/noaa/da/bhoover/global-workflow/ush/rocoto/setup_expt.py --pslot bth_test --configdir /work/noaa/da/bhoover/global-workflow/parm/config --idate 2020082200 --edate 2020090500 --comrot /work/noaa/stmp/bhoover/ROTDIRS --expdir /work/noaa/da/bhoover/para --resdet 384 --resens 192 --nens 80 --gfs_cyc 1

Workflow setup:
/work/noaa/da/bhoover/global-workflow/ush/rocoto/setup_workflow.py --expdir /work/noaa/da/bhoover/para/bth_test

Initial conditions:
/work/noaa/da/cthomas/ICS/2020082200/

The error is found in the gdasgldas task on 2020082600.

Log file:
/work/noaa/stmp/bhoover/ROTDIRS/bth_test/logs/2020082600/gdasgldas.log

Context
This run is being used by a new Orion user and member of the satellite DA group, only to familiarize myself with the process of carrying out an experiment. There have been no code-changes made for this run. I followed directions for cloning and building the global-workflow, and setting up a cycled experiment, from the available wiki:

https://github.com/NOAA-EMC/global-workflow/wiki/

I did not create the initial condition files, they were instead produced for me. The global-workflow repository was cloned on January 25 2022 (d3028b9)

The task fails with the following error in the log-file:

0: NOAH Restart File Used: noah.rst
0: 1 1536 768 389408
0: Restart Tile Space Mismatch, Halting..
0: endrun is being called
0: application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

The dimension size of 389408 is suspicious, since earlier in the log a different dimension size is referenced, e.g.:

0: MSG: maketiles -- Size of Grid Dimension: 398658 ( 0 )

When I search for "389408" in the log-file, it only appears in two places, one is in the Restart Tile Space Mismatch error, and the other is while running exec/gldas_rst, when reporting the results of a FAST_BYTESWAP:

216.121 + /work/noaa/da/bhoover/global-workflow/exec/gldas_rst
216.121 + 1>& 1 2>& 2
FAST_BYTESWAP ALGORITHM HAS BEEN USED AND DATA ALIGNMENT IS CORRECT FOR 4 )
1536 768 4 9440776
2 tmp0_10cmdown GLDAS STC1
3 tmp10_40cmdown GLDAS STC2
4 tmp40_100cmdown GLDAS STC3
5 tmp100_200cmdown GLDAS STC4
6 soill0_10cmdown GLDAS SLC1
7 soill10_40cmdown GLDAS SLC2
8 soill40_100cmdown GLDAS SLC3
9 soill100_200cmdown GLDAS SLC4
10 soilw0_10cmdown GLDAS SMC1
11 soilw10_40cmdown GLDAS SMC2
12 soilw40_100cmdown GLDAS SMC3
13 soilw100_200cmdown GLDAS SMC4
15 landsfc
18 vtypesfc
71 tmpsfc GLDAS SKNT
72 weasdsfc GLDAS SWE
79 cnwatsfc GLDAS CMC
88 snodsfc GLDAS SNOD
1 1536 768 389408
216.602 + err=0

I believe that the error is related to the difference in tile-size between these two values.

Detailed Description
I have proposed no change or addition to the code for this run.

Additional Information
Prior gdasgldas tasks in the run from initialization to 2020082600 were successful, but they were all skipped either because the analysis was for a non-00z period or because the requisite number of cycles had not been completed to allow the task to trigger. There are no successful gdasgldas tasks in this run that I can use to compare to the one that has failed. I have conferred with more experienced EMC users of fv3gdas and the cause of the problem is not obvious.

Possible Implementation
I have no implementation plan to offer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions