Skip to content

Commit

Permalink
Support global-workflow using Rocky 8 on CSPs (NOAA-EMC#2998)
Browse files Browse the repository at this point in the history
# Description

With ParallelWorks now default Rocky 8 on CSPs, and move to Rocky 8 only
after 1/1/2025,
we need to modify global-workflow module files to use Rocky 8 supported
spack-stack,
and test compile and run to make sure all works under Rocky 8.

i) Rocky 8 update new features:

a. Wave worked in C48_S2SWA_gefs case, so turn SUPPORT_WAVES to "YES" in
awspw.yaml.
Actually, if we did not set SUPPORT_WAVES to "YES", setup_expt.py will
rise exception.

b. Using two type of nodes (chips/queues) on AWS, compute/process, where
forecasts run in "compute" queue,
which is a big node (more cores), others run in "process" queue, which
has small node (less cores).

ii) Rocky 8 update needs the following submodules PRs below

- NOAA-EMC/gfs_utils#81
- ufs-community/UFS_UTILS#989
- NOAA-EMC/UPP#1034
- ufs-community/ufs-weather-model#2461

Resolves NOAA-EMC#2997

---------

Co-authored-by: David Huber <69919478+DavidHuber-NOAA@users.noreply.github.com>
  • Loading branch information
weihuang-jedi and DavidHuber-NOAA authored Dec 24, 2024
1 parent e684944 commit 290f1d2
Show file tree
Hide file tree
Showing 19 changed files with 161 additions and 84 deletions.
68 changes: 27 additions & 41 deletions env/AWSPW.env
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,29 @@ else
exit 2
fi

if [[ "${step}" = "fcst" ]] || [[ "${step}" = "efcs" ]]; then
if [[ "${step}" = "prep" ]] || [[ "${step}" = "prepbufr" ]]; then

export POE="NO"
export BACK="NO"
export sys_tp="AWSPW"
export launcher_PREP="srun"

elif [[ "${step}" = "prepsnowobs" ]]; then

export APRUN_CALCFIMS="${APRUN_default}"

elif [[ "${step}" = "prep_emissions" ]]; then

export APRUN="${APRUN_default}"

elif [[ "${step}" = "waveinit" ]] || [[ "${step}" = "waveprep" ]] || [[ "${step}" = "wavepostsbs" ]] || [[ "${step}" = "wavepostbndpnt" ]] || [[ "${step}" = "wavepostbndpntbll" ]] || [[ "${step}" = "wavepostpnt" ]]; then

export CFP_MP="YES"
if [[ "${step}" = "waveprep" ]]; then export MP_PULSE=0 ; fi
export wavempexec=${launcher}
export wave_mpmd=${mpmd_opt}

elif [[ "${step}" = "fcst" ]] || [[ "${step}" = "efcs" ]]; then

export launcher="srun --mpi=pmi2 -l"

Expand All @@ -52,52 +74,16 @@ elif [[ "${step}" = "waveinit" ]] || [[ "${step}" = "waveprep" ]] || [[ "${step}

elif [[ "${step}" = "post" ]]; then

export NTHREADS_NP=${NTHREADS1}
export APRUN_NP="${APRUN_default}"

export NTHREADS_DWN=${threads_per_task_dwn:-1}
[[ ${NTHREADS_DWN} -gt ${max_threads_per_task} ]] && export NTHREADS_DWN=${max_threads_per_task}
export APRUN_DWN="${launcher} -n ${ntasks_dwn}"

elif [[ "${step}" = "atmos_products" ]]; then

export USE_CFP="YES" # Use MPMD for downstream product generation on Hera
export NTHREADS_UPP=${NTHREADS1}
export APRUN_UPP="${APRUN_default} --cpus-per-task=${NTHREADS_UPP}"

elif [[ "${step}" = "oceanice_products" ]]; then

export NTHREADS_OCNICEPOST=${NTHREADS1}
export APRUN_OCNICEPOST="${launcher} -n 1 --cpus-per-task=${NTHREADS_OCNICEPOST}"

elif [[ "${step}" = "ecen" ]]; then

export NTHREADS_ECEN=${NTHREADSmax}
export APRUN_ECEN="${APRUN_default}"

export NTHREADS_CHGRES=${threads_per_task_chgres:-12}
[[ ${NTHREADS_CHGRES} -gt ${max_tasks_per_node} ]] && export NTHREADS_CHGRES=${max_tasks_per_node}
export APRUN_CHGRES="time"

export NTHREADS_CALCINC=${threads_per_task_calcinc:-1}
[[ ${NTHREADS_CALCINC} -gt ${max_threads_per_task} ]] && export NTHREADS_CALCINC=${max_threads_per_task}
export APRUN_CALCINC="${APRUN_default}"

elif [[ "${step}" = "esfc" ]]; then

export NTHREADS_ESFC=${NTHREADSmax}
export APRUN_ESFC="${APRUN_default}"

export NTHREADS_CYCLE=${threads_per_task_cycle:-14}
[[ ${NTHREADS_CYCLE} -gt ${max_tasks_per_node} ]] && export NTHREADS_CYCLE=${max_tasks_per_node}
export APRUN_CYCLE="${APRUN_default}"

elif [[ "${step}" = "epos" ]]; then

export NTHREADS_EPOS=${NTHREADSmax}
export APRUN_EPOS="${APRUN_default}"

elif [[ "${step}" = "fit2obs" ]]; then
elif [[ "${step}" = "atmos_products" ]]; then

export NTHREADS_FIT2OBS=${NTHREADS1}
export MPIRUN="${APRUN_default}"
export USE_CFP="YES" # Use MPMD for downstream product generation on AWS

fi
17 changes: 11 additions & 6 deletions env/AZUREPW.env
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ export mpmd_opt="--multi-prog --output=mpmd.%j.%t.out"
# Configure MPI environment
export OMP_STACKSIZE=2048000
export NTHSTACK=1024000000
export UCX_TLS=ud,sm,self

ulimit -s unlimited
ulimit -a
Expand Down Expand Up @@ -50,6 +51,10 @@ elif [[ "${step}" = "waveinit" ]] || [[ "${step}" = "waveprep" ]] || [[ "${step}
export wavempexec=${launcher}
export wave_mpmd=${mpmd_opt}

elif [[ "${step}" = "prep_emissions" ]]; then

export APRUN="${APRUN_default}"

elif [[ "${step}" = "post" ]]; then

export NTHREADS_NP=${NTHREADS1}
Expand All @@ -71,33 +76,33 @@ elif [[ "${step}" = "oceanice_products" ]]; then
elif [[ "${step}" = "ecen" ]]; then

export NTHREADS_ECEN=${NTHREADSmax}
export APRUN_ECEN="${APRUN}"
export APRUN_ECEN="${APRUN_default}"

export NTHREADS_CHGRES=${threads_per_task_chgres:-12}
[[ ${NTHREADS_CHGRES} -gt ${max_tasks_per_node} ]] && export NTHREADS_CHGRES=${max_tasks_per_node}
export APRUN_CHGRES="time"

export NTHREADS_CALCINC=${threads_per_task_calcinc:-1}
[[ ${NTHREADS_CALCINC} -gt ${max_threads_per_task} ]] && export NTHREADS_CALCINC=${max_threads_per_task}
export APRUN_CALCINC="${APRUN}"
export APRUN_CALCINC="${APRUN_default}"

elif [[ "${step}" = "esfc" ]]; then

export NTHREADS_ESFC=${NTHREADSmax}
export APRUN_ESFC="${APRUN}"
export APRUN_ESFC="${APRUN_default}"

export NTHREADS_CYCLE=${threads_per_task_cycle:-14}
[[ ${NTHREADS_CYCLE} -gt ${max_tasks_per_node} ]] && export NTHREADS_CYCLE=${max_tasks_per_node}
export APRUN_CYCLE="${APRUN}"
export APRUN_CYCLE="${APRUN_default}"

elif [[ "${step}" = "epos" ]]; then

export NTHREADS_EPOS=${NTHREADSmax}
export APRUN_EPOS="${APRUN}"
export APRUN_EPOS="${APRUN_default}"

elif [[ "${step}" = "fit2obs" ]]; then

export NTHREADS_FIT2OBS=${NTHREADS1}
export MPIRUN="${APRUN}"
export MPIRUN="${APRUN_default}"

fi
4 changes: 2 additions & 2 deletions env/GOOGLEPW.env
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ if [[ "${step}" = "fcst" ]] || [[ "${step}" = "efcs" ]]; then

elif [[ "${step}" = "prep_emissions" ]]; then

export APRUN
export APRUN="${APRUN_default}"

elif [[ "${step}" = "waveinit" ]] || [[ "${step}" = "waveprep" ]] || [[ "${step}" = "wavepostsbs" ]] || [[ "${step}" = "wavepostbndpnt" ]] || [[ "${step}" = "wavepostbndpntbll" ]] || [[ "${step}" = "wavepostpnt" ]]; then

Expand Down Expand Up @@ -102,6 +102,6 @@ elif [[ "${step}" = "epos" ]]; then
elif [[ "${step}" = "fit2obs" ]]; then

export NTHREADS_FIT2OBS=${NTHREADS1}
export MPIRUN="${APRUN}"
export MPIRUN="${APRUN_default}"

fi
3 changes: 3 additions & 0 deletions modulefiles/module_base.noaacloud.lua
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,11 @@ Load environment to run GFS on noaacloud
local spack_mod_path=(os.getenv("spack_mod_path") or "None")
prepend_path("MODULEPATH", spack_mod_path)

load("gnu")
load(pathJoin("stack-intel", (os.getenv("stack_intel_ver") or "None")))
load(pathJoin("stack-intel-oneapi-mpi", (os.getenv("stack_impi_ver") or "None")))
unload("gnu")

load(pathJoin("python", (os.getenv("python_ver") or "None")))

load(pathJoin("jasper", (os.getenv("jasper_ver") or "None")))
Expand Down
6 changes: 3 additions & 3 deletions modulefiles/module_gwci.noaacloud.lua
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@ help([[
Load environment to run GFS workflow setup scripts on noaacloud
]])

prepend_path("MODULEPATH", "/contrib/spack-stack/spack-stack-1.6.0/envs/unified-env/install/modulefiles/Core")
prepend_path("MODULEPATH", "/contrib/spack-stack-rocky8/spack-stack-1.6.0/envs/ue-env/install/modulefiles/Core")

load(pathJoin("stack-intel", os.getenv("2021.3.0")))
load(pathJoin("stack-intel-oneapi-mpi", os.getenv("2021.3.0")))
load(pathJoin("stack-intel", os.getenv("2021.10.0")))
load(pathJoin("stack-intel-oneapi-mpi", os.getenv("2021.10.0")))

load(pathJoin("netcdf-c", os.getenv("4.9.2")))
load(pathJoin("netcdf-fortran", os.getenv("4.6.1")))
Expand Down
13 changes: 7 additions & 6 deletions modulefiles/module_gwsetup.noaacloud.lua
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,18 @@ Load environment to run GFS workflow setup scripts on noaacloud

load(pathJoin("rocoto"))

prepend_path("MODULEPATH", "/contrib/spack-stack/spack-stack-1.6.0/envs/unified-env/install/modulefiles/Core")
prepend_path("MODULEPATH", "/contrib/spack-stack-rocky8/spack-stack-1.6.0/envs/ue-intel/install/modulefiles/Core")

local stack_intel_ver=os.getenv("stack_intel_ver") or "2021.3.0"
local python_ver=os.getenv("python_ver") or "3.10.3"
load("gnu")
local stack_intel_ver=os.getenv("stack_intel_ver") or "2021.10.0"
local stack_mpi_ver=os.getenv("stack_mpi_ver") or "2021.10.0"

load(pathJoin("stack-intel", stack_intel_ver))
load(pathJoin("python", python_ver))
load(pathJoin("stack-intel-oneapi-mpi", stack_mpi_ver))
unload("gnu")

load("py-jinja2")
load("py-pyyaml")
load("py-numpy")
local git_ver=os.getenv("git_ver") or "1.8.3.1"
load(pathJoin("git", git_ver))

whatis("Description: GFS run setup environment")
6 changes: 3 additions & 3 deletions parm/config/gefs/config.resources
Original file line number Diff line number Diff line change
Expand Up @@ -41,15 +41,15 @@ case ${machine} in
;;
"AWSPW")
export PARTITION_BATCH="compute"
max_tasks_per_node=36
max_tasks_per_node=48
;;
"AZUREPW")
export PARTITION_BATCH="compute"
max_tasks_per_node=24
max_tasks_per_node=36
;;
"GOOGLEPW")
export PARTITION_BATCH="compute"
max_tasks_per_node=32
max_tasks_per_node=30
;;
*)
echo "FATAL ERROR: Unknown machine encountered by ${BASH_SOURCE[0]}"
Expand Down
58 changes: 58 additions & 0 deletions parm/config/gefs/config.resources.AWSPW
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,61 @@ unset memory
for mem_var in $(env | grep '^memory_' | cut -d= -f1); do
unset "${mem_var}"
done

step=$1

case ${step} in
"fcst" | "efcs")
export PARTITION_BATCH="compute"
max_tasks_per_node=48
;;

"arch")
export PARTITION_BATCH="process"
max_tasks_per_node=24
;;

"prep_emissions")
export PARTITION_BATCH="process"
max_tasks_per_node=24
export ntasks=1
export threads_per_task=1
export tasks_per_node=$(( max_tasks_per_node / threads_per_task ))
;;

"waveinit")
export PARTITION_BATCH="process"
max_tasks_per_node=24
export ntasks=12
export threads_per_task=1
export tasks_per_node=$(( max_tasks_per_node / threads_per_task ))
export NTASKS=${ntasks}
;;

"wavepostpnt")
export PARTITION_BATCH="compute"
max_tasks_per_node=48
export ntasks=240
export threads_per_task=1
export tasks_per_node=$(( max_tasks_per_node / threads_per_task ))
export NTASKS=${ntasks}
;;

"wavepostsbs" | "wavepostbndpnt" | "wavepostbndpntbll")
export PARTITION_BATCH="process"
max_tasks_per_node=24
export ntasks=24
export threads_per_task=1
export tasks_per_node=$(( max_tasks_per_node / threads_per_task ))
export NTASKS=${ntasks}
;;

*)
export PARTITION_BATCH="process"
max_tasks_per_node=24
;;

esac

export max_tasks_per_node

8 changes: 4 additions & 4 deletions parm/config/gfs/config.resources
Original file line number Diff line number Diff line change
Expand Up @@ -107,16 +107,16 @@ case ${machine} in
;;
"AWSPW")
export PARTITION_BATCH="compute"
npe_node_max=36
max_tasks_per_node=36
npe_node_max=48
max_tasks_per_node=48
# TODO Supply a max mem/node value for AWS
# shellcheck disable=SC2034
mem_node_max=""
;;
"AZUREPW")
export PARTITION_BATCH="compute"
npe_node_max=24
max_tasks_per_node=24
npe_node_max=36
max_tasks_per_node=36
# TODO Supply a max mem/node value for AZURE
# shellcheck disable=SC2034
mem_node_max=""
Expand Down
24 changes: 24 additions & 0 deletions parm/config/gfs/config.resources.AWSPW
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,27 @@ unset memory
for mem_var in $(env | grep '^memory_' | cut -d= -f1); do
unset "${mem_var}"
done

step=$1

case ${step} in
"fcst" | "efcs")
export PARTITION_BATCH="compute"
max_tasks_per_node=48
;;

"arch")
export PARTITION_BATCH="process"
max_tasks_per_node=24
;;


*)
export PARTITION_BATCH="process"
max_tasks_per_node=24
;;

esac

export max_tasks_per_node

2 changes: 1 addition & 1 deletion sorc/build_ufs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ EXEC_NAME="gfs_model.x"

while getopts ":da:fj:e:vwy" option; do
case "${option}" in
d) BUILD_TYPE="Debug";;
d) BUILD_TYPE="DEBUG";;
a) APP="${OPTARG}";;
f) FASTER="ON";;
j) BUILD_JOBS="${OPTARG}";;
Expand Down
2 changes: 1 addition & 1 deletion sorc/gfs_utils.fd
6 changes: 3 additions & 3 deletions versions/build.noaacloud.ver
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
export stack_intel_ver=2021.3.0
export stack_impi_ver=2021.3.0
export stack_intel_ver=2021.10.0
export stack_impi_ver=2021.10.0
export spack_env=gsi-addon-env
source "${HOMEgfs:-}/versions/spack.ver"
export spack_mod_path="/contrib/spack-stack/spack-stack-${spack_stack_ver}/envs/gsi-addon-env/install/modulefiles/Core"
export spack_mod_path="/contrib/spack-stack-rocky8/spack-stack-${spack_stack_ver}/envs/gsi-addon-env/install/modulefiles/Core"
6 changes: 3 additions & 3 deletions versions/run.noaacloud.ver
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
export stack_intel_ver=2021.3.0
export stack_impi_ver=2021.3.0
export stack_intel_ver=2021.10.0
export stack_impi_ver=2021.10.0
export spack_env=gsi-addon-env

source "${HOMEgfs:-}/versions/spack.ver"
export spack_mod_path="/contrib/spack-stack/spack-stack-${spack_stack_ver}/envs/gsi-addon-env/install/modulefiles/Core"
export spack_mod_path="/contrib/spack-stack-rocky8/spack-stack-${spack_stack_ver}/envs/gsi-addon-env/install/modulefiles/Core"

export cdo_ver=2.2.0
4 changes: 2 additions & 2 deletions workflow/hosts/awspw.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,5 @@ MAKE_ACFTBUFR: 'NO'
DO_TRACKER: 'NO'
DO_GENESIS: 'NO'
DO_METP: 'NO'
SUPPORT_WAVES: 'NO'
SUPPORTED_RESOLUTIONS: ['C48', 'C96'] # TODO: Test and support all cubed-sphere resolutions.
SUPPORTED_RESOLUTIONS: ['C48', 'C96', 'C192', 'C384', 'C768'] # TODO: Test and support all cubed-sphere resolutions.
AERO_INPUTS_DIR: /contrib/global-workflow-shared-data/data/gocart_emissions
6 changes: 4 additions & 2 deletions workflow/hosts/azurepw.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,5 +24,7 @@ LOCALARCH: 'NO'
ATARDIR: '' # TODO: This will not yet work from AZURE.
MAKE_NSSTBUFR: 'NO'
MAKE_ACFTBUFR: 'NO'
SUPPORT_WAVES: 'NO'
SUPPORTED_RESOLUTIONS: ['C48', 'C96'] # TODO: Test and support all cubed-sphere resolutions.
DO_TRACKER: 'NO'
DO_GENESIS: 'NO'
DO_METP: 'NO'
SUPPORTED_RESOLUTIONS: ['C48', 'C96', 'C384', 'C768'] # TODO: Test and support all cubed-sphere resolutions.
Loading

0 comments on commit 290f1d2

Please sign in to comment.