Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor bugfixes for CCPP v6 #1231

Merged
merged 13 commits into from
May 25, 2022

Conversation

grantfirl
Copy link
Collaborator

@grantfirl grantfirl commented May 23, 2022

PR Checklist

  • This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.

  • This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR

  • An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR
    are specified below.

  • Results for one or more of the regression tests change and the reasons for the changes are understood and explained below.

  • New or updated input data is required by this PR. If checked, please work with the code managers to update input data sets on all platforms.

Description

This PR only contains minor bugfixes for CCPP-framework and CCPP-physics and should not affect UFS RTs at all (will test to confirm)

Issue(s) addressed

NCAR/ccpp-physics#927
NCAR/ccpp-framework#450

Testing

All tests in rt.conf pass with existing baselines.

  • hera.intel
  • hera.gnu
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss_cray
  • wcoss_dell_p3
  • opnReqTest for newly added/changed feature
  • CI

Dependencies

NCAR/ccpp-physics#924
NCAR/ccpp-framework#451
NOAA-EMC/fv3atm#541

@junwang-noaa
Copy link
Collaborator

@grantfirl We will work on your PR today, are your branches up to date?

@grantfirl grantfirl changed the title update FV3 submodule pointer and .gitmodules for testing Minor bugfixes for CCPP v6 May 24, 2022
@grantfirl
Copy link
Collaborator Author

@grantfirl We will work on your PR today, are your branches up to date?

Yes, this is ready to work on.

@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: jet
Compiler: intel
Job: RT
[RT] Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/944854842/20220525011648/ufs-weather-model
Please make changes and add the following label back: jet-intel-RT

@junwang-noaa
Copy link
Collaborator

@BrianCurtis-NOAA It looks to me that  the tests are still running on jet . Not sure why we got this message. 

on-behalf-of @ufs-community <brian.curtis@noaa.gov>
@jkbk2004
Copy link
Collaborator

@BrianCurtis-NOAA It looks to me that  the tests are still running on jet . Not sure why we got this message.

@junwang-noaa maybe a python try/except hiccup with slurm pending warning... But yes jobs still running

on-behalf-of @ufs-community <brian.curtis@noaa.gov>
on-behalf-of @ufs-community <brian.curtis@noaa.gov>
on-behalf-of @ufs-community <brian.curtis@noaa.gov>
@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: hera
Compiler: intel
Job: RT
[RT] Repo location: /scratch1/NCEPDEV/nems/emc.nemspara/autort/pr/944854842/20220525011511/ufs-weather-model
[RT] Error: Test control_c384gdas_wav 125 failed in run_test failed
Please make changes and add the following label back: hera-intel-RT

@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: gaea
Compiler: intel
Job: RT
[RT] Repo location: /lustre/f2/pdata/ncep/emc.nemspara/autort/pr/944854842/20220525011507/ufs-weather-model
[RT] Error: Test hafs_regional_atm_thompson_gfdlsf 095 failed in run_test failed
Please make changes and add the following label back: gaea-intel-RT

@BrianCurtis-NOAA
Copy link
Collaborator

@BrianCurtis-NOAA It looks to me that  the tests are still running on jet . Not sure why we got this message.

@junwang-noaa Jet is notoriously error prone it seems to me. If jobs are still running on Jet but the error message appeared on here, that means Jet killed the AutoRT but ecflow is still running things. There will be no logs automatically pushed and i'm not confident all jobs will run.

@junwang-noaa
Copy link
Collaborator

@grantfirl The cray/dell log files are under /scratch1/NCEPDEV/stmp2/Jun.Wang/wcosslog

@DeniseWorthen
Copy link
Collaborator

gaea.intel failure is

 65: fv3.exe: lib/darshan-common.c:262: darshan_track_common_val_counters: Assertion `found == counter' failed.
 65: forrtl: error (76): Abort trap signal

probably just needs to be re-run.

@jkbk2004
Copy link
Collaborator

@BrianCurtis-NOAA It looks to me that  the tests are still running on jet . Not sure why we got this message.

@junwang-noaa Jet is notoriously error prone it seems to me. If jobs are still running on Jet but the error message appeared on here, that means Jet killed the AutoRT but ecflow is still running things. There will be no logs automatically pushed and i'm not confident all jobs will run.

@BrianCurtis-NOAA yeah, slurm pending issues dumped fail_tests: cpld_debug_p8 006, hafs_regional_atm_thompson_gfdlsf 088, hafs_regional_datm_cdeps 094. I will see if I can run these three cases manually.

@grantfirl
Copy link
Collaborator Author

The control_c384gdas_wav failed for me the first time (time-out) on hera.intel too. It passed while running separately.

@grantfirl
Copy link
Collaborator Author

/scratch1/NCEPDEV/stmp2/Jun.Wang/wcosslog

Thanks. They've been added/pushed.

@junwang-noaa
Copy link
Collaborator

The control_c384gdas_wav was reran and it finished successfully on hera_intel.

@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: jet
Compiler: intel
Job: RT
[RT] Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/944854842/20220525114517/ufs-weather-model
[RT] Error: Test control_c384 018 failed in run_test failed
[RT] Error: Test control_c384gdas 019 failed in run_test failed
[RT] Error: Test hafs_regional_atm_thompson_gfdlsf 088 failed in run_test failed
[RT] Error: Test hafs_regional_atm_ocn_wav 091 failed in run_test failed
[RT] Error: Test hafs_regional_datm_cdeps 094 failed in run_test failed
Please make changes and add the following label back: jet-intel-RT

@jkbk2004
Copy link
Collaborator

Automated RT Failure Notification Machine: jet Compiler: intel Job: RT [RT] Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/944854842/20220525114517/ufs-weather-model [RT] Error: Test control_c384 018 failed in run_test failed [RT] Error: Test control_c384gdas 019 failed in run_test failed [RT] Error: Test hafs_regional_atm_thompson_gfdlsf 088 failed in run_test failed [RT] Error: Test hafs_regional_atm_ocn_wav 091 failed in run_test failed [RT] Error: Test hafs_regional_datm_cdeps 094 failed in run_test failed Please make changes and add the following label back: jet-intel-RT

@BrianCurtis-NOAA @junwang-noaa if I run manually on jet, hafs_regional_atm_thompson_gfdlsf ran ok but all others seem to have slurm unknown status and pending issues.

@junwang-noaa
Copy link
Collaborator

@grantfirl RT passed, please merge CCPP PRs. Thanks

@grantfirl
Copy link
Collaborator Author

@junwang-noaa The FV3 submodule pointer has been updated and .gitmodules reverted.

@junwang-noaa junwang-noaa merged commit 6b6462b into ufs-community:develop May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
No Baseline Change No Baseline Change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants