Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

master+bgc #13

Closed
hakaseh opened this issue Nov 2, 2021 · 35 comments
Closed

master+bgc #13

hakaseh opened this issue Nov 2, 2021 · 35 comments

Comments

@hakaseh
Copy link
Collaborator

hakaseh commented Nov 2, 2021

@aekiss I'm opening up a new issue to report any changes I make while I try to create the master+bgc branch.

My first test run suggests that I need to increase max_files in diag_manager_nml of ocean/input.nml.

/scratch/v45/hh0162/access-om2-cycle4/control/1deg_jra55_iaf_bgc/archive/error_logs/access-om2.30703854.gadi-pbs.err:

FATAL from PE   184: diag_util_mod::init_file:  max_files exceeded, increase max_files via the max_files variable in the namelist diag_manager_nml.

In 1-deg, max_files = 200. I changed to 400.

After this change, I got this error next (/scratch/v45/hh0162/access-om2-cycle4/control/1deg_jra55_iaf_bgc/archive/error_logs/access-om2.30738106.gadi-pbs.err):

FATAL from PE     1: diag_axis_mod::diag_axis_init: num_axis_sets (**) exceeds max_num_axis_sets (**).  Increase max_num_axis_sets via diag_manager_nml.

I'm not really sure what this means. I think we can tell how many files will be produced from diag_table, but can we tell also what value to set for max_num_axis_sets?

@hakaseh
Copy link
Collaborator Author

hakaseh commented Nov 3, 2021

increasing max_num_axis_sets to 400 worked.

so without understanding what each item means (i think i understand max_files), I'm setting all to 400 for the moment:

    max_axes          = 400
    max_files         = 400
    max_num_axis_sets = 400

let me know if this is problematic.

@hakaseh
Copy link
Collaborator Author

hakaseh commented Nov 3, 2021

@aekiss if you think the BGC input files are fine (/g/data/v45/hh0162/projects/icebgc/prep_omip2/input_om2-bgc/*deg*), should we move them to /g/data/ik11/inputs/access-om2/input_2021mmdd/mom_*deg?

Can i2o.nc and o2i.nc be replaced with those in the BGC input files? The only difference is that they have additional BGC coupling tracers (#11 (comment)).

@aekiss
Copy link
Contributor

aekiss commented Nov 4, 2021

Hi @hakaseh, I don't see a problem with increasing these limits. This is just due to the large number of diagnostic files.
I needed to use

    max_axes          = 400
    max_files         = 200
    max_num_axis_sets = 200

in the 0.1deg IAF. https://github.com/COSIMA/01deg_jra55_iaf/blob/01deg_jra55v140_iaf_cycle3/ocean/input.nml#L23

@aekiss
Copy link
Contributor

aekiss commented Nov 4, 2021

I'm in the process of moving the inputs to /g/data/ik11 today - sorry for the delays, I've been tied up with other work.
I've tweaked the script to get i2o.nc and o2i.nc from /g/data/ik11/inputs/access-om2/input_20201102/.

@aekiss
Copy link
Contributor

aekiss commented Nov 4, 2021

I've re-created the input files using this updated version of the script in this PR: COSIMA/input_om2-bgc#4
and put them in /g/data/ik11/inputs/access-om2/input_bgc_20211104.

The master+bgc branches will need two entries for the mom and cice inputs, with /g/data/ik11/inputs/access-om2/input_bgc_20211104 listed first so its i2o.nc and o2i.nc will be used.

@aekiss
Copy link
Contributor

aekiss commented Nov 4, 2021

I haven't put the 01deg-cycle4 inputs there yet.

@hakaseh
Copy link
Collaborator Author

hakaseh commented Nov 5, 2021

@aekiss my first test run with input_bgc_20211104 failed with the following error (/scratch/v45/hh0162/access-om2-cycle4/control/1deg_jra55_iaf_bgc/archive/error_logs/access-om2.30865676.gadi-pbs.err):

FATAL from PE 139: file/field INPUT/dust.nc/dust couldnt recognize axis atts in time_interp_external

I noticed there are slight difference between dust.nc in input_bgc_20211104 and /g/data/v45/hh0162/projects/icebgc/prep_omip2/input_om2-bgc/1deg/dust.nc.

/g/data/ik11/inputs/access-om2/input_bgc_20211104/1deg looks like this:

dimensions:
	time = UNLIMITED ; // (12 currently)
	grid_x_T = 360 ;
	grid_y_T = 300 ;
variables:
	double grid_x_T(grid_x_T) ;
		grid_x_T:_FillValue = NaN ;
		grid_x_T:long_name = "Nominal Longitude of T-cell center" ;
		grid_x_T:units = "degree_east" ;
		grid_x_T:modulo = 360. ;
		grid_x_T:point_spacing = "even" ;
		grid_x_T:axis = "X" ;
	double grid_y_T(grid_y_T) ;
		grid_y_T:_FillValue = NaN ;
		grid_y_T:long_name = "Nominal Latitude of T-cell center" ;
		grid_y_T:units = "degree_north" ;
		grid_y_T:point_spacing = "uneven" ;
		grid_y_T:axis = "Y" ;
	double time(time) ;
		time:_FillValue = NaN ;
		time:modulo = "y" ;
		time:axis = "T" ;
		time:standard_name = "time" ;
		time:bounds = "time_bnds" ;
		time:units = "days since 0001-01-01" ;
		time:calendar = "julian" ;
	float lon(grid_y_T, grid_x_T) ;
		lon:_FillValue = NaNf ;
	float lat(grid_y_T, grid_x_T) ;
		lat:_FillValue = NaNf ;
	double dust(time, grid_y_T, grid_x_T) ;
		dust:_FillValue = -1.e+34 ;
		dust:missing_value = -1.e+34 ;

while the original file looks like this:

/g/data/v45/hh0162/projects/icebgc/prep_omip2/input_om2-bgc/1deg/dust.nc

	time = UNLIMITED ; // (12 currently)
	grid_x_T = 360 ;
	grid_y_T = 300 ;
variables:
	double grid_x_T(grid_x_T) ;
		grid_x_T:_FillValue = NaN ;
		grid_x_T:long_name = "tcell longitude" ;
		grid_x_T:units = "degrees_E" ;
		grid_x_T:cartesian_axis = "X" ;
	double grid_y_T(grid_y_T) ;
		grid_y_T:_FillValue = NaN ;
		grid_y_T:long_name = "tcell latitude" ;
		grid_y_T:units = "degrees_N" ;
		grid_y_T:cartesian_axis = "Y" ;
	double time(time) ;
		time:_FillValue = NaN ;
		time:modulo = "y" ;
		time:axis = "T" ;
		time:standard_name = "time" ;
		time:bounds = "time_bnds" ;
		time:units = "days since 0001-01-01" ;
		time:calendar = "julian" ;
	float lon(grid_y_T, grid_x_T) ;
		lon:_FillValue = NaNf ;
	float lat(grid_y_T, grid_x_T) ;
		lat:_FillValue = NaNf ;
	double dust(time, grid_y_T, grid_x_T) ;
		dust:_FillValue = -1.e+34 ;
		dust:missing_value = -1.e+34 ;

any idea?

@aekiss
Copy link
Contributor

aekiss commented Nov 5, 2021

It looks like it's having trouble interpreting the axes. There needs to be metadata that matches one of these https://github.com/mom-ocean/MOM5/blob/1d9af9d262b/src/shared/axis_utils/axis_utils.F90#L92-L99

    lon_names = (/'lon','x  '/)
    lat_names = (/'lat','y  '/)
    z_names = (/'depth ','height','z     '/)
    t_names = (/'time','t   '/)
    lon_units = (/'degrees_e   ', 'degrees_east', 'degreese    '/)
    lat_units = (/'degrees_n    ', 'degrees_north', 'degreesn     '/)
    z_units = (/'cm ','m  ','pa ','hpa'/)
    t_units = (/'sec', 'min','hou','day','mon','yea'/)

so I guess the problem is it can't recognize grid_x_T:units = "degree_east" and grid_y_T:units = "degree_north" since they don't start with degrees. I'll fix them.

did /g/data/v45/hh0162/projects/icebgc/prep_omip2/input_om2-bgc/1deg/dust.nc work in the past?

@aekiss
Copy link
Contributor

aekiss commented Nov 5, 2021

@hakaseh try again with /g/data/ik11/inputs/access-om2/input_bgc_20211105/1deg

@hakaseh
Copy link
Collaborator Author

hakaseh commented Nov 7, 2021

thanks @aekiss. the issue is resolved with the new input data.

it has worked with /g/data/v45/hh0162/projects/icebgc/prep_omip2/input_om2-bgc/1deg/dust.nc in the past.

For this fix, did you update https://github.com/COSIMA/input_om2-bgc/tree/ak-dev ? If so, I should merge this into main before I start working on COSIMA/input_om2-bgc#5 ?

@aekiss
Copy link
Contributor

aekiss commented Nov 7, 2021

Glad it worked. It was in ak-dev and I've now merged that into main.

@hakaseh
Copy link
Collaborator Author

hakaseh commented Nov 8, 2021

@aekiss and @russfiedler have you see this type of error:

--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 217 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[gadi-cpu-clx-1429.gadi.nci.org.au:1742745] PMIX ERROR: UNREACHABLE in file ../../../../../../../opal/mca/pmix/pmix3x/pmix/src/server/pmix_server.c at line 2147
forrtl: error (78): process killed (SIGTERM)

I get this in my test run with 1deg-RYF (with or without BGC; with the latest executables), while it works fine for other configs (1deg-IAF, 025deg-IAF, 025-RYF). My directory is: /scratch/v45/hh0162/access-om2-cycle4/control/1deg_jra55_ryf_bgc/archive/error_logs/access-om2.30955006.gadi-pbs.err. i'll look into further later but just sharing with you in case you have seen this before.

@aekiss
Copy link
Contributor

aekiss commented Nov 8, 2021

I've seen similar things before but never had much idea what was causing it.

Is this repeatable? If not, might be a transient glitch on the machine

@hakaseh
Copy link
Collaborator Author

hakaseh commented Nov 8, 2021

it is repeatable, and i am still getting the same error. it's strange because i don't get the same error with the 5 other configs. i might re-create this config.

@hakaseh
Copy link
Collaborator Author

hakaseh commented Nov 8, 2021

@aekiss could you try to run master+bgc of 1deg-RYF and see if you can reproduce the same issue? I pushed the branch here: https://github.com/hakaseh/1deg_jra55_ryf/tree/master+bgc_tmp

@aekiss
Copy link
Contributor

aekiss commented Nov 9, 2021

yes, I get the same issue

@hakaseh
Copy link
Collaborator Author

hakaseh commented Nov 9, 2021

thanks for checking. i'll try to re-create the experiment.

@aekiss
Copy link
Contributor

aekiss commented Nov 9, 2021

A test run of the current master (https://github.com/COSIMA/1deg_jra55_ryf/tree/878db7c7) worked fine. I'll now try the new exes.

@aekiss
Copy link
Contributor

aekiss commented Nov 9, 2021

It seems to run fine with the latest non-bgc exes

      exe: /g/data/ik11/inputs/access-om2/bin/yatm_0ab7295.exe
      exe: /g/data/ik11/inputs/access-om2/bin/fms_ACCESS-OM-BGC_6256fdc_libaccessom2_0ab7295.x
      exe: /g/data/ik11/inputs/access-om2/bin/cice_auscom_360x300_24p_2572851_libaccessom2_0ab7295.exe

I'll try your BGC exes next.

@aekiss
Copy link
Contributor

aekiss commented Nov 9, 2021

It worked fine when I used the current master (https://github.com/COSIMA/1deg_jra55_ryf/tree/878db7c7) with your config.yaml from https://github.com/hakaseh/1deg_jra55_ryf/tree/master+bgc_tmp
ie using your master+bgc exes and inputs but without your changes to cice_in.nml, input_ice.nml, diag_table, input.nml.

@aekiss
Copy link
Contributor

aekiss commented Nov 9, 2021

I noticed that field_table in https://github.com/hakaseh/1deg_jra55_ryf/tree/master+bgc_tmp is the same as in the current master (https://github.com/COSIMA/1deg_jra55_ryf/tree/878db7c7) so you're missing something there

@aekiss
Copy link
Contributor

aekiss commented Nov 9, 2021

also why is this an RYF config, given this issue is in 1deg_jra55_iaf?

@hakaseh
Copy link
Collaborator Author

hakaseh commented Nov 9, 2021

good morning @aekiss. good news that i resolved this issue after creating this experiment from master of https://github.com/COSIMA/1deg_jra55_ryf. Thanks for your further testing!

Now I think all of the 6 master+bgc configs are ready to go. At least my test runs (5 years for 1 deg, 1 year for 025 deg, and 1 month for 01 deg) were successful. If you are happy with them, we can add them to the COSIMA repo.

https://github.com/hakaseh/1deg_jra55_iaf/tree/master+bgc
https://github.com/hakaseh/1deg_jra55_ryf/tree/master+bgc
https://github.com/hakaseh/025deg_jra55_iaf/tree/master+bgc
https://github.com/hakaseh/025deg_jra55_ryf/tree/master+bgc
https://github.com/hakaseh/01deg_jra55_iaf/tree/master+bgc
https://github.com/hakaseh/01deg_jra55_ryf/tree/master+bgc

@aekiss
Copy link
Contributor

aekiss commented Nov 9, 2021

Awesome, thanks @hakaseh!

I think it would be best to first add them as master+bgc branches to the repos in https://github.com/COSIMA. Then I can check them and make changes easily.

To do this, cd to each config directory and do git remote -v.
If you don't have any remotes called upstream you can then do

git remote add upstream https://github.com/COSIMA/<config>.git
git checkout master+bgc
git push upstream master+bgc

in each config directory, where <config> needs to be replaced by the appropriate configuration name, e.g. 1deg_jra55_ryf or 1deg_jra55_iaf etc.

@hakaseh
Copy link
Collaborator Author

hakaseh commented Nov 9, 2021

@aekiss I have the COSIMA repo as origin, so I could do git push origin master+bgc:master+bgc? or upstream better?

There seems to be issue with permission:

hakaseh	https://github.com/hakaseh/1deg_jra55_iaf.git (fetch)
hakaseh	https://github.com/hakaseh/1deg_jra55_iaf.git (push)
origin	https://github.com/COSIMA/1deg_jra55_iaf.git (fetch)
origin	https://github.com/COSIMA/1deg_jra55_iaf.git (push)
(base) [hh0162@gadi-login-01 1deg_jra55_iaf_bgc]$ git remote add upstream https://github.com/COSIMA/1deg_jra55_iaf.git
(base) [hh0162@gadi-login-01 1deg_jra55_iaf_bgc]$ git push upstream master+bgc
Username for 'https://github.com': hakaseh
Password for 'https://hakaseh@github.com': 
remote: Permission to COSIMA/1deg_jra55_iaf.git denied to hakaseh.
fatal: unable to access 'https://github.com/COSIMA/1deg_jra55_iaf.git/': The requested URL returned error: 403

@aekiss
Copy link
Contributor

aekiss commented Nov 9, 2021

yes, if origin already points to the COSIMA repo you can just do

git push origin master+bgc

I've just given you elevated access to https://github.com/COSIMA/1deg_jra55_iaf - let me know if that works and I'll then fix the others.

@hakaseh
Copy link
Collaborator Author

hakaseh commented Nov 9, 2021

awesome, i was able to git push this time!

@aekiss
Copy link
Contributor

aekiss commented Nov 9, 2021

cool, I'll fix the rest

@aekiss
Copy link
Contributor

aekiss commented Nov 9, 2021

ok they should all work now - let me know how you go

@hakaseh
Copy link
Collaborator Author

hakaseh commented Nov 10, 2021

all done now :)

@aekiss
Copy link
Contributor

aekiss commented Nov 10, 2021

great, thanks :-)

@aekiss
Copy link
Contributor

aekiss commented Nov 10, 2021

They all look fine to me - nice work! I've pushed a few commits to update the metadata and README files in all 6 versions.

@aekiss
Copy link
Contributor

aekiss commented Nov 10, 2021

I'm now merging master+bgc into the 01deg_jra55v140_iaf_cycle4 branch

@hakaseh
Copy link
Collaborator Author

hakaseh commented Nov 10, 2021 via email

@hakaseh
Copy link
Collaborator Author

hakaseh commented Nov 10, 2021 via email

@hakaseh hakaseh closed this as completed Nov 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants