Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIXWGTS not found for C48 by gaussian_sfcanl.sh from job gdasanalcalc #1085

Closed
RussTreadon-NOAA opened this issue Oct 21, 2022 · 5 comments
Closed
Labels
bug Something isn't working

Comments

@RussTreadon-NOAA
Copy link
Contributor

RussTreadon-NOAA commented Oct 21, 2022

Expected behavior
gdasanalcalc should run to completion for a CASE=C48 parallel.

Current behavior
gaussian_sfcanl.sh fails in gdasanalcalc because file FIXWGTS=${FIXWGTS:-$FIXfv3/$CASE/fv3_SCRIP_${CASE}_GRIDSPEC_lon${LONB_SFC}_lat${LATB_SFC}.gaussian.neareststod.nc} does not exist.

In the C48 parallel, FIXWGTS expands to fix/orog/C48/fv3_SCRIP_C48_GRIDSPEC_lon192_lat96.gaussian.neareststod.nc. The fix/orog/C48 directory contains fv3_SCRIP_C48_GRIDSPEC_lon192_lat94.gaussian.neareststod.nc.

Machines affected
Observed on Hera. Likely affects all machines.

To Reproduce

  1. install g-w develop
  2. set up EXPDIR
  3. populate ROTDIR to execute gdasanalcalc
  4. rocotoboot gdasanalcalc
  5. check gdasanalcalc.log

Context
An example of the error is found in /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/runs/s2s-test/testc48/COMROT/testc48/logs/2021032318/gdasanalcalc.log

Program gaussian_sfcanl fails with

0:      PROGRAM GAUSSIAN_SFCANL HAS BEGUN. COMPILED 2018179.55     ORG: NP20
srun: error: h1c07: task 0: Exited with exit code 22
srun: launch/slurm: _step_signal: Terminating StepId=36908389.2
0:      STARTING DATE-TIME  OCT 21,2022  17:25:43.370  294  FRI   2459874
0:
0:
0:  - BEGIN EXECUTION
0:
0:  - READ SETUP NAMELIST
0:
0:  - READ INTERPOLATION WEIGHT FILE
0:
0:  ** FATAL ERROR: OPENING weights.nc: No such file or directory
0:  STOP.

weights.nc is links to the non-existing FIXWGTS file mentioned above.

Detailed Description
gaussian_sfcanl.sh contains the following

CASE=${CASE:-C768}
res=$(echo $CASE | cut -c2-)
LONB_CASE=$((res*4))
LATB_CASE=$((res*2))
LONB_SFC=${LONB_SFC:-$LONB_CASE}
LATB_SFC=${LATB_SFC:-$LATB_CASE}

...

FIXWGTS=${FIXWGTS:-$FIXfv3/$CASE/fv3_SCRIP_${CASE}_GRIDSPEC_lon${LONB_SFC}_lat${LATB_SFC}.gaussian.neareststod.nc}

For this parallel, CASE=C48 so res=48 and LATB_CASE=96. In turn, LATB_SFC=96 and the script sets

FIXWGTS=/scratch2/NCEPDEV/ocean/Guillaume.Vernieres/sandboxes/global-workflow-gv/fix/orog/C48/fv3_SCRIP_C48_GRIDSPEC_lon192_lat96.gaussian.neareststod.nc

However, /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/sandboxes/global-workflow-gv/fix/orog/C48/ contains

fv3_SCRIP_C48_GRIDSPEC_lon192_lat94.gaussian.neareststod.nc.

Additional Information
The postanl job references the sfcanl.nc created by gaussian_sfcanl.sh. The gdasecen and gdasefc jobs depend upon completion of gdasanalcalc. The parallel is stuck until we get gdasanalcalc to successfully complete.

Possible Implementation
Either logic in gaussian_sfcanl.sh is wrong or the fix file is incorrectly named. If the fix file is incorectly named, simply changing the name may not work. Will the fields in fv3_SCRIP_C48_GRIDSPEC_lon192_lat94.gaussian.neareststod.nc work with lat96?

@RussTreadon-NOAA RussTreadon-NOAA added the bug Something isn't working label Oct 21, 2022
@RussTreadon-NOAA
Copy link
Contributor Author

This issue may be another instance of what was reported in #1054. Reference #1054 from this issue for completeness. Tag @guillaumevernieres since it's his parallel we are trying to fix.

@KateFriedman-NOAA
Copy link
Member

@RussTreadon-NOAA @guillaumevernieres Was this issue resolved via #1054 and #1066?

@RussTreadon-NOAA
Copy link
Contributor Author

RussTreadon-NOAA commented Feb 1, 2023

@RussTreadon-NOAA @guillaumevernieres Was this issue resolved via #1054 and #1066?

@KateFriedman-NOAA , I don't know.

@KateFriedman-NOAA
Copy link
Member

Recent updates/fixes to the primary fix set (issue #1054) means that the file that was missing now exists:

-bash-4.2$ pwd
/scratch1/NCEPDEV/global/glopara/fix/orog/20220805
-bash-4.2$ ll C48/fv3_SCRIP_C48_GRIDSPEC_lon192_lat96.gaussian.neareststod.nc
-rw-r--r-- 1 glopara global 2637600 Jun 27  2018 C48/fv3_SCRIP_C48_GRIDSPEC_lon192_lat96.gaussian.neareststod.nc

As seen from within a built/linked clone of global-workflow develop:

-bash-4.2$ pwd
/scratch1/NCEPDEV/global/Kate.Friedman/git/develop/fix
-bash-4.2$ ll orog/C48/fv3_SCRIP_C48_GRIDSPEC_lon192_lat96.gaussian.neareststod.nc
-rw-r--r-- 1 glopara global 2637600 Jun 27  2018 orog/C48/fv3_SCRIP_C48_GRIDSPEC_lon192_lat96.gaussian.neareststod.nc

@aerorahul
Copy link
Contributor

The fix file had incorrect dimensions for C48 as well as the name of the file was incorrect, which lead to the File Not Found error in the gaussian_sfcanl program.
A new fix file for this resolution was created by @GeorgeGayno-NOAA and added to the FIX dataset by @KateFriedman-NOAA
This new fix file was tested in #1274 by @aerorahul
This issue can now be closed.
If further problems exist at this resolution, this can be reopened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants