Skip to content

gldas fails with USE_CFP=YES on Hera #1089

@RussTreadon-NOAA

Description

@RussTreadon-NOAA

Expected behavior
gdasgldas should run to completion with USE_CFP=YES

Current behavior
When gdasgldas runs on Hera with USE_CFP=YES, it fails with

+ gldas_forcing.sh[68](20211222): [[ YES = \Y\E\S ]]
+ gldas_forcing.sh[69](20211222): rm -f ./cfile
+ gldas_forcing.sh[70](20211222): touch ./cfile
+ gldas_forcing.sh[72](20211222): echo '/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack/intel-18.0.5.274/grib_util/1.2.2/bin/copygb -i3 '\''-g255 0 2881 1441 90000 0 128 -90000 360000 125 125'\'' -x gdas.2021122112 grib.12'
+ gldas_forcing.sh[73](20211222): echo '/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack/intel-18.0.5.274/grib_util/1.2.2/bin/copygb -i3 '\''-g255 0 2881 1441 90000 0 128 -90000 360000 125 125'\'' -x gdas.2021122118 grib.18'
+ gldas_forcing.sh[74](20211222): echo '/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack/intel-18.0.5.274/grib_util/1.2.2/bin/copygb -i3 '\''-g255 0 2881 1441 90000 0 128 -90000 360000 125 125'\'' -x gdas.2021122200 grib.00'
+ gldas_forcing.sh[75](20211222): echo '/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack/intel-18.0.5.274/grib_util/1.2.2/bin/copygb -i3 '\''-g255 0 2881 1441 90000 0 128 -90000 360000 125 125'\'' -x gdas.2021122206 grib.06'
+ gldas_forcing.sh[77](20211222): srun -l --export=ALL -n 84 --multi-prog ./cfile
srun: error: Invalid task range specification (/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack/intel-18.0.5.274/grib_util/1.2.2/bin/copygb)
srun: error: Line 1 of configuration file ./cfile invalid
+ gldas_forcing.sh[1](20211222): postamble gldas_forcing.sh 1666501527 1

Machines affected
Hera

To Reproduce

  1. install a fresh clone of g-w develop on Hera
  2. create EXPDIR
  3. populate ROTDIR with files needed to run 00Z gldas. Need to ensure ROTDIR contains sufficient history of sfluxgrb files to fully run at 00Z
  4. submit 00Z gdasgldas job

Additional Information
The Hera gdasgldas job log file indicates that CFP failed because the command file contains four entries but srun was invoked with 84 tasks. This, apparently, causes a problem on Hera. A check of operational gdasgldas log files on WCOSS2 show that CFP on WCOSS2 is OK with specifying more tasks than entries in the command file.

Possible Implementation
If it is true that on Hera the number of tasks must equal the number of entries in the command file, the gldas script(s) invoking CFP can count the number of entries in command files and execute CFP specifying that number of tasks.

For the time being, I changed USE_CFP="YES" in the gldas section of HERA.env to USE_CFP="NO". The Hera gldas job runs to completion with this change.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions