Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't build coupled model in debug mode on orion #287

Closed
DeniseWorthen opened this issue Nov 19, 2020 · 10 comments
Closed

can't build coupled model in debug mode on orion #287

DeniseWorthen opened this issue Nov 19, 2020 · 10 comments
Labels
bug Something isn't working

Comments

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Nov 19, 2020

Description

Building the coupled model with DEBUG=Y fails on orion. The same commands build the model on hera successfully.

To Reproduce:

Checkout ufs-weather develop branch:

module use modulefiles/orion.intel
module load fv3_debug
CMAKE_FLAGS="-DS2S=ON -DDEBUG=ON" CCPP_SUITES="FV3_GFS_2017_coupled,FV3_GFS_2017_satmedmf_coupled,FV3_GFS_v15p2_coupled" ./build.sh >output 2>&1 &

Output

Build fails:

Screen Shot 2020-11-19 at 9 07 04 AM

Build directory on Orion:

/work/noaa/marine/dworthen/ufs_tod

@DeniseWorthen DeniseWorthen added the bug Something isn't working label Nov 19, 2020
@DeniseWorthen
Copy link
Collaborator Author

Has anyone else been able to replicate this? I tried running the cpld_debug test for the currrent develop branch. The err log (in rt_number/compile_3) shows:

  1. esmf/8_1_0_beta_snapshot_27

The esmf debug library on orion is esmf/8_1_0_beta_snapshot_27-debug

Any ideas on why this is getting set incorrectly?

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Nov 30, 2020 via email

@DeniseWorthen
Copy link
Collaborator Author

My build directory on orion is:

/work/noaa/marine/dworthen/ufs_dw. In there, I have rt.conf reduced to just the cpld_debug compile and run.

One output directory is here: /work/noaa/stmp/dworthen/stmp/dworthen/FV3_RT/rt_112174

I just noticed that in the compile_1/err I see:

The following have been reloaded with a version change:

  1. esmf/8_1_0_beta_snapshot_27 => esmf/8_1_0_beta_snapshot_27-debug

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Nov 30, 2020 via email

@DeniseWorthen
Copy link
Collaborator Author

Yes, I agree that it seems to be loading the right esmf debug module, but the compile immediately fails on 'use esmf', as if it is actually trying to use the non-exisistent "_debug."

@DeniseWorthen
Copy link
Collaborator Author

DeniseWorthen commented Nov 30, 2020

I can compile in debug mode on orion if I do not use ecflow.

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Nov 30, 2020 via email

@climbfuji
Copy link
Collaborator

@DeniseWorthen can you check if you have an old version of the ecflow server running (ps ux)? If so, kill it with kill -9 PID and try again.

@DeniseWorthen
Copy link
Collaborator Author

Yes thanks. I did find a very old ecflow job running which I killed. I'll try again.

@DeniseWorthen
Copy link
Collaborator Author

It seems that the old ecflow server was the issue. I've been able to repeat the same test now and am able to build in debug mode. Thanks all!

pjpegion pushed a commit to NOAA-PSL/ufs-weather-model.p7b that referenced this issue Jul 20, 2021
* gitmodule to climbfuji/flake_from_yihua and ccpp/physics pointer update
* new pointer update
* Pointer update
* updated pointer to NCAR/ccpp-physics and reverted .gitmodules
epic-cicd-jenkins pushed a commit that referenced this issue Apr 17, 2023
…semble forecasts; remove obsolete physics suites; get WE2E tests to run on cheyenne (#287)

## DESCRIPTION OF CHANGES: 

### Bugs fixed:
* In exregional_make_orog.sh, remove the else-statement that causes the script to exit if the suite is not FV3_RRFS_v1beta.
* In exregional_run_fcst.sh, remove lines that create a symlink in the run directory to the model_configure file in the cycle directory.  These lines seem to have been inadvertantly reintroduced into the script and cause ensemble forecasts to fail.
 
### Other modifications:
* Remove suites FV3_GSD_SAR_v1 and FV3_RRFS_v0 from workflow since they are no longer in ufs-weather-model.  Also remove the WE2E test configuration files for these suites (config.regional_013.sh and config.regional_016.sh).
* In exregional_make_orog.sh, for the RRFS_v1beta suite, modify the command that copies the orography statistics files needed by the drag parameterization such that only files matching *_ls*.nc and *_ss*.nc are copied instead of everything (because the source directory may contain other files that do not need to be copied).
* In the WE2E configuration file for the RRFS_v1beta suite (config.FV3_RRFS_v1beta.sh), change the location where the additional orography files needed by this suite are copied from to a common location rather than a user directory.
* Remove unused script create_model_config_files.sh.
* Rename the function (and file) create_model_config_file(.sh) to create_model_configure_file(.sh) because the file that this function creates is called model_configure, not model_config.
* Modify WE2E test configuration files as well as the test run script (tests/run_experiments.sh) to get the tests to run more easily on cheyenne.  Still need to make a manual change to the settings in run_experiments.sh, but this made it possible to run the tests.

## TESTS CONDUCTED:
Ran all 26 WE2E tests both **on hera and cheyenne**.  24 of the 26 succeeded.  Details:
* regional_010 failed, but it was already broken.
* user_download_extrn_files failed.  It seems to have failed to obtain the external model files from NOMADS (and this step is done during workflow generation, not as part of any workflow task).  This test is completely unrelated to this PR, so the failure may have already existed in the develop branch.
* The remaining 24 tests (including the one for the FV3_RRFS_v1beta suite) succeeded without problems.

Note that the FV3_RRFS_v1beta suite was also tested on the GSD_HRRR3km grid.  This failed at around hour 4 (for a 6-hour forecast) with a very non-informative error.  This test was also tried previously with hash 8165575 from the NCAR fork of ufs-weather-model (in the dtc/develop branch), and that finished successfully.  Not clear what changed between these two versions of ufs-weather-model.

## OTHER CONTRIBUTORS:
@JeffBeck-NOAA
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants