Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

put date in MOM output filenames #185

Closed
aekiss opened this issue Jan 21, 2020 · 65 comments
Closed

put date in MOM output filenames #185

aekiss opened this issue Jan 21, 2020 · 65 comments

Comments

@aekiss
Copy link
Contributor

aekiss commented Jan 21, 2020

At present all the MOM outputs have the same name for every run, e.g. ocean.nc.
I propose we include the run date in the filenames, e.g. ocean_1985_01_01, which is what we already have in the CICE outputs.

This gives many advantages:

  1. users can tell what dates are in which file (a common question, currently dealt with by using the cosima cookbook or run_summary, which is pretty awkward for such a basic thing)
  2. users can easily find files in a required date range via bash shell/script
  3. users can copy all run outputs into one directory without them clobbering each other
  4. importantly, this suits the bash-based workflow of users at BoM and CSIRO (I've had requests from Gary and Paul for this)
  5. this goes some way towards addressing this issue Set netcdf global attributes to record origin of all published .nc files #57

This could be implemented as a post-processing step.

I'm not sure if the filename should include just the starting date or both starting and ending dates (the latter being confusing, as it is midnight of the day after).

@aekiss
Copy link
Contributor Author

aekiss commented Jan 21, 2020

I'm not sure if this would require changes to the cosima cookbook, but if so they'd be small - the cookbook already supports dates in CICE output filenames. Probably only scripts would need to change, not the cookbook itself?

@aekiss
Copy link
Contributor Author

aekiss commented Jan 21, 2020

If we decide to go ahead with this, should we apply it to all files already in /g/data/hh5/tmp/cosima/access-om2*, or just new runs? Applying to all would be neater but would require rebuilding the cookbook database and would probably break a lot of user cookbook scripts.

@aekiss
Copy link
Contributor Author

aekiss commented Jan 21, 2020

Rather than moving the files to new filenames, we could hard- or sym- link. This would preserve existing workflows while also supporting BoM and CSIRO.

@russfiedler
Copy link

I did this when creating ensembles of the IAF runs. I'll see if the script survived the transition to Gadi.

@russfiedler
Copy link

Looks like they went kaput 22 hours ago. The surviving json files I have indicated the I was using names like ocean_temp-%Y-%m.nc.

@russfiedler
Copy link

Note that the dates can be done automatically via the diag_table. There's no need for special treatment for each time.

@aekiss
Copy link
Contributor Author

aekiss commented Jan 21, 2020

Ah, good to know diag_table can do it. I'm just looking it up here: https://github.com/mom-ocean/MOM5/blob/master/src/shared/diag_manager/diag_table.F90#L45

It would be good to include both starting and ending date, e.g. something like ocean_1985_12_01-1986_01_01.nc, since runs have different lengths and this will enable checking that all files over a given interval have been selected. But that doesn't seem possible, right?

In any case, we would need to use something else if we want to process the existing outputs.

@russfiedler
Copy link

russfiedler commented Jan 21, 2020

You can dump files periodically, say monthly. This keeps file sizes under control and you can even exploit the parallelism when postprocessing if you want. You can run with variable numbers of months and across years seamlessly.

We use entries like

"ocean_ofam%4yr%2mo",1,"days",1,"days","Time",1,"months" "ocean_model","eta_t","eta_t","ocean_ofam%4yr%2mo","all",.true.,"none",2 "ocean_model","temp","temp","ocean_ofam%4yr%2mo","all",.true.,"none",4 "ocean_model","salt","salt","ocean_ofam%4yr%2mo","all",.true.,"none",4 "ocean_model","u","u","ocean_ofam%4yr%2mo","all",.true.,"none",4 "ocean_model","v","v","ocean_ofam%4yr%2mo","all",.true.,"none",4

to dump monthly files full of daily averages.

@aekiss
Copy link
Contributor Author

aekiss commented Jan 21, 2020

Thanks @russfiedler, that's a great tip

@aekiss
Copy link
Contributor Author

aekiss commented Jan 21, 2020

@angus-g am I right in thinking no code changes would be needed in the cookbook?

@angus-g
Copy link

angus-g commented Jan 21, 2020

@aekiss that's right, it doesn't matter what the filename is. You can still use % as a wildcard if you need to use the filename for disambiguation too.

@AndyHoggANU
Copy link
Contributor

I think this is a good idea.
Let's work out the details when we next start a new run, then think about migrating older data in the future. Based on our experience with saving data for publication (see /g/data/cj50/access-om2) this reprocessing won't be trivial ...

@aidanheerdegen
Copy link
Contributor

I think @russfiedler is on the right track. Save each month's data in a separate file, uniquely named, using the diag_table naming capabilities.

This is part of the configuration, so test it and once happy roll it out to the published config. I'd be using dev branches for the configs, in the same way as proposed for code.

I'd would not support changing existing runs. Simply not worth the time/effort IMO.

@AndyHoggANU
Copy link
Contributor

Each month?
Or each year/segment, whichever is smaller?

@aidanheerdegen
Copy link
Contributor

For the tenth monthly, as you never run for a year. For the quarter and tenth degree, probably yearly.

This has the benefit that whatever the run length the duration of output files would be consistent.

@russfiedler
Copy link

russfiedler commented Jan 21, 2020

I think you want consistent sizes throughout the run. It makes checking things much simpler. I'd suggest monthly output and yearly for the others

@aidanheerdegen Great minds think alike!

@AndyHoggANU
Copy link
Contributor

True, but we sometimes use 3-monthly output ...

@aidanheerdegen
Copy link
Contributor

Now you're just being difficult.

@russfiedler
Copy link

Three monthly averaged output for the 0.1 or do you mean the others? You'd be putting those outputs in a separate file anyway.

@russfiedler
Copy link

The entry for the 1 and 0.25 models would be something like

"ocean_3mon%4yr",3,"months",1,"days","Time",1,"years"

so you would have 4 entries per file.

@AndyHoggANU
Copy link
Contributor

Yep, I have only ever used 3-monthly for the 01deg case...

@russfiedler
Copy link

Ah, so that means you have to run for 3 or 6 month segments at the moment, right? You would have an entry like
"ocean_3mon%4yr%2mo",3,"months",1,"days","Time",3,"months"

@AndyHoggANU
Copy link
Contributor

Yes, still running with 3-month segments... A 12-hour wall-time limit (or linear scaling up to 12,000 cores) would allow us to do a year at a time.

@aidanheerdegen
Copy link
Contributor

Looks like they went kaput 22 hours ago. The surviving json files I have indicated the I was using names like ocean_temp-%Y-%m.nc.

@russfiedler looks like short is still here until the 28th if you wanted to grab something from it

@russfiedler
Copy link

@aidanheerdegen Ta. I've popped the scripts to do the inking in /scratch/v45/raf599/assim if anybody wants to use them as a starting point. There were only 2 months per segment for runs 96-197 that I was interested in so I didn't have to do anything tricky like parsing a ncdump of the files. It does make loading up an individual month easy and a single year can be loaded by just getting the odd (or even) months.

aekiss added a commit to aekiss/01deg_jra55_iaf that referenced this issue Feb 27, 2020
start from /home/157/amh157/payu/01deg_jra55v13_ryf9091/archive/restart371 using the same config,
but use IAF forcing, copied as needed from https://github.com/COSIMA/01deg_jra55_iaf/tree/3411eed79b5b55d8db7b5ddfcbfc111bc9e40abf for
    - accessom2.nml
    - atmosphere/forcing.json
    - config.yaml

other changes:

- disable all cice output

- set up mom outputs
    - output scalars and 2d surface_temp and eta_t only
    - 4 hourly
    - use snapshots
    - include model date in file - see COSIMA/access-om2#185

- use openMPI4.0.2 executables
    /g/data/ik11/inputs/access-om2/bin/yatm_575fb04.exe
    /g/data/ik11/inputs/access-om2/bin/fms_ACCESS-OM_4a2f211_libaccessom2_575fb04.x
    /g/data/ik11/inputs/access-om2/bin/cice_auscom_3600x2700_722p_365bdc1_libaccessom2_575fb04.exe
instead of the openMPI4.0.1 versions
    /g/data/ik11/inputs/access-om2/bin/yatm_1bb8904.exe
    /g/data/ik11/inputs/access-om2/bin/fms_ACCESS-OM_97e3429_libaccessom2_1bb8904.x
    /g/data/ik11/inputs/access-om2/bin/cice_auscom_3600x2700_722p_d3e8bdf_libaccessom2_1bb8904.exe
The code differences shouldn't make any scientific difference
https://github.com/COSIMA/libaccessom2/compare/1bb8904..575fb04
https://github.com/mom-ocean/MOM5/compare/97e3429..4a2f211
https://github.com/COSIMA/cice5/compare/d3e8bdf..365bdc1
https://github.com/COSIMA/oasis3-mct/compare/d02cc8d896..87a873aa7
@aekiss
Copy link
Contributor Author

aekiss commented Mar 17, 2020

I'm trying to come up with a consistent file naming convention for all MOM output at all resolutions in the new configurations. A key objective is to improve data accessibility by making it possible to determine what data is available (variables, temporal sampling, dates) by simply using ls. This would be a big improvement over the current opaque file-naming approach which has hindered uptake of model outputs by others.

Here's a proposed convention:

  • a separate file for each variable, also disambiguated by sampling frequency (this means lots of files, and a fiddly diag_table, but I think it's worth it for clarity)
  • filenames consisting of these components:
    • ocean_
    • data spatial dimensionality (1d-, 2d- or 3d-) - this is technically redundant but very useful for users unfamiliar with MOM diagnostic names
    • netcdf variable name (NB: separated from previous and next components by - instead of _ to facilitate parsing, since CF-compliant variable names can contain _ but not -)
    • and time info for non-static data:
      • sampling period within file, e.g. -4hourly, -daily, -5daily, -monthly, -3monthly, -yearly
      • reduction method: whether each sample is a time-mean over the sampling period (_mean) or a _snapshot at the end of the sampling period (corresponding to the 6th item in the diag_table field line being .true. or .false., respectively), or something else (e.g. _rms, _pow02, _min, _max, etc - see https://github.com/mom-ocean/MOM5/blob/master/src/shared/diag_manager/diag_table.F90#L159)
      • _<year>[_<month>[_<day>]] for the start of the first sampling interval in file - only as many as needed for disambiguation. 4 digits for years, 2 for month, 2 for day, with leading zeros as needed (to ensure sensible alphabetic sorting), achieved by %4yr, %4yr%2mo or %4yr%2mo%2dy in the diag_table entry
      • finally the temporal length of the file (based on new_file_freq, new_file_freq_units in file line of diag_table) e.g. _1month, _3months, _5years just to make it clear how much data is in the file (since this can vary independently of the sampling period)

This order of components is designed to sort alphabetically in a helpful way.

Examples:

ocean_2d-geolon_t.nc                                  # static grid data: no sampling or date info
ocean_1d-ke_tot-monthly_mean_1990_1year.nc            # 12 monthly means in one file
ocean_2d-sea_level-monthly_mean_1990_04_1month.nc     # a single 1-month mean
ocean_3d-temp-monthly_mean_1990_04_3months.nc         # three 1-month means
ocean_3d-temp-3monthly_mean_1990_04_3months.nc        # a single 3-month mean
ocean_3d-salt-daily_snapshot_1990_04_1month.nc        # a month of daily snapshots
ocean_3d-salt-daily_snapshot_1990_04_01_1day.nc       # daily snapshots, one file per day

achieved by these diag_table specifications

"ocean_2d-geolon_t", -1, "months", 1, "days", "time"
"ocean_model","geolon_t","geolon_t","ocean_2d-geolon_t","all",.false.,"none",2

"ocean_1d-ke_tot-monthly_mean%4yr_1year", 1,  "months", 1, "days", "time", 12, "months"
"ocean_model","ke_tot","ke_tot", "ocean_1d-ke_tot-monthly_mean%4yr_1year","all",.true.,"none",1

"ocean_2d-sea_level-monthly_mean%4yr%2mo_1month", 1,  "months", 1, "days", "time", 1, "months"
"ocean_model","sea_level","sea_level", "ocean_2d-sea_level-monthly_mean%4yr%2mo_1month","all",.true.,"none",2

...

"ocean_3d-salt-daily_snapshot%4yr%2mo%2dy_1day", 1,  "days", 1, "days", "time", 1, "days"
"ocean_model","salt","salt", "ocean_3d-salt-daily_snapshot%4yr%2mo%2dy_1day","all",.false.,"none",2

Does that seem OK to people? (ping @AndyHoggANU, @aidanheerdegen, @russfiedler)
It might look like overkill but I think it will be helpful in the long run to have a systematic approach that will cover all current and likely future needs.

I'm not sure whether we should to do something like this for CICE output too. Each file includes lots of static grid data so that's an argument to retain our current approach of saving many CICE variables per file.

@aekiss
Copy link
Contributor Author

aekiss commented Mar 17, 2020

@angus-g would a large increase in the number output files cause problems for the COSIMA Cookbook? And will this file naming convention suit the way the cookbook concatenates files on the time axis (e.g. if the final filename component varies during a run)?

@angus-g
Copy link

angus-g commented Mar 17, 2020

The database part of the cookbook shouldn't have any issues with more files. There would probably be a small increase in the size of the database itself, but I can't see queries getting noticeably slower. For concatenation, the filenames themselves don't matter: the files are sorted by the start time obtained from the time dimension data.

The only one thing I could see causing a change from the current behaviour is that we can't quite rely on the same form of filename-based disambiguation. It would be harder for the cookbook to suggest that a query is erroneous, but we can still pass patterns (like ocean_3d_%) to select only a subset of filenames.

@AndyHoggANU
Copy link
Contributor

Wow, OK, I think I like it.
It would certainly make it easier to publish the data, because these filenames will most satisfy the requirements of published data. A few points:

  • I am worried about whether we will blow our iNodes quote on NCI to smithereens. We should be able to calculate it?
  • It is worth socialising the idea at this week's MOM meeting. I would like to know from others who are not using the cookbook whether this will meet requirements.
  • I will defer to @angus-g on cookbook matters, so that sounds OK.

@aekiss
Copy link
Contributor Author

aekiss commented May 21, 2020

I've written a script to automatically generate diag_table with this new output file format - see https://github.com/COSIMA/make_diag_table.

With this, users will only need to modify a very clean and non-repetitive diag_table_source.yaml file, which make_diag_table.py will read to generate diag_table. It's general enough that it should be unnecessary to hand-edit the diag_table file.

aekiss added a commit to COSIMA/make_diag_table that referenced this issue May 22, 2020
@aekiss
Copy link
Contributor Author

aekiss commented May 22, 2020

Unfortunately MOM insists on putting _ (underscore) before the date, so we get filenames like
ocean-2d-tx_trans_int_z-1-monthly-_1958_01_16.nc.

The -_ looks odd but is probably preferable to
ocean-2d-tx_trans_int_z-1-monthly_1958_01_16.nc
because it allows "fields" to be consistently split by - (dash).

Eliminating the leading underscore would be neater:
ocean-2d-tx_trans_int_z-1-monthly-1958_01_16.nc
but would require a code change in MOM, with a namelist variable to retain the current behaviour by default. Not sure if that's worth the bother.

Also notice that I've decided to retain the 1- in 1-monthly so that the number of "fields" is consistent.

@aekiss
Copy link
Contributor Author

aekiss commented May 22, 2020

Example output files are here (this is a 3-month test run with monthly outputs):
/scratch/v45/aek156/access-om2/archive/1deg_jra55_iaf_v2.0.0rc1/output000/ocean/
Notice that the day field in the date is the middle of the averaging period.
Comments welcome. It is super easy to change the file name convention with the make_diag_table.py script, so if you don't like it, let me know!

ocean-2d-area_t.nc
ocean-2d-area_u.nc
ocean-2d-bmf_u-1-monthly-_1958_01_16.nc
ocean-2d-bmf_u-1-monthly-_1958_02_15.nc
ocean-2d-bmf_u-1-monthly-_1958_03_16.nc
ocean-2d-bmf_v-1-monthly-_1958_01_16.nc
ocean-2d-bmf_v-1-monthly-_1958_02_15.nc
ocean-2d-bmf_v-1-monthly-_1958_03_16.nc
ocean-2d-drag_coeff.nc
ocean-2d-dxt.nc
ocean-2d-dxu.nc
ocean-2d-dyt.nc
ocean-2d-dyu.nc
ocean-2d-eta_t-1-monthly-_1958_01_16.nc
ocean-2d-eta_t-1-monthly-_1958_02_15.nc
ocean-2d-eta_t-1-monthly-_1958_03_16.nc
ocean-2d-evap-1-monthly-_1958_01_16.nc
ocean-2d-evap-1-monthly-_1958_02_15.nc
ocean-2d-evap-1-monthly-_1958_03_16.nc
ocean-2d-frazil_3d_int_z-1-monthly-_1958_01_16.nc
ocean-2d-frazil_3d_int_z-1-monthly-_1958_02_15.nc
ocean-2d-frazil_3d_int_z-1-monthly-_1958_03_16.nc
ocean-2d-geolat_c.nc
ocean-2d-geolat_t.nc
ocean-2d-geolon_c.nc
ocean-2d-geolon_t.nc
ocean-2d-ht.nc
ocean-2d-hu.nc
ocean-2d-ice_calving-1-monthly-_1958_01_16.nc
ocean-2d-ice_calving-1-monthly-_1958_02_15.nc
ocean-2d-ice_calving-1-monthly-_1958_03_16.nc
ocean-2d-kmt.nc
ocean-2d-kmu.nc
ocean-2d-melt-1-monthly-_1958_01_16.nc
ocean-2d-melt-1-monthly-_1958_02_15.nc
ocean-2d-melt-1-monthly-_1958_03_16.nc
ocean-2d-mld-1-monthly-_1958_01_16.nc
ocean-2d-mld-1-monthly-_1958_02_15.nc
ocean-2d-mld-1-monthly-_1958_03_16.nc
ocean-2d-pbot_t-1-monthly-_1958_01_16.nc
ocean-2d-pbot_t-1-monthly-_1958_02_15.nc
ocean-2d-pbot_t-1-monthly-_1958_03_16.nc
ocean-2d-pme_river-1-monthly-_1958_01_16.nc
ocean-2d-pme_river-1-monthly-_1958_02_15.nc
ocean-2d-pme_river-1-monthly-_1958_03_16.nc
ocean-2d-river-1-monthly-_1958_01_16.nc
ocean-2d-river-1-monthly-_1958_02_15.nc
ocean-2d-river-1-monthly-_1958_03_16.nc
ocean-2d-runoff-1-monthly-_1958_01_16.nc
ocean-2d-runoff-1-monthly-_1958_02_15.nc
ocean-2d-runoff-1-monthly-_1958_03_16.nc
ocean-2d-sea_level-1-monthly-_1958_01_16.nc
ocean-2d-sea_level-1-monthly-_1958_02_15.nc
ocean-2d-sea_level-1-monthly-_1958_03_16.nc
ocean-2d-sea_level_sq-1-monthly-_1958_01_16.nc
ocean-2d-sea_level_sq-1-monthly-_1958_02_15.nc
ocean-2d-sea_level_sq-1-monthly-_1958_03_16.nc
ocean-2d-sfc_salt_flux_coupler-1-monthly-_1958_01_16.nc
ocean-2d-sfc_salt_flux_coupler-1-monthly-_1958_02_15.nc
ocean-2d-sfc_salt_flux_coupler-1-monthly-_1958_03_16.nc
ocean-2d-sfc_salt_flux_ice-1-monthly-_1958_01_16.nc
ocean-2d-sfc_salt_flux_ice-1-monthly-_1958_02_15.nc
ocean-2d-sfc_salt_flux_ice-1-monthly-_1958_03_16.nc
ocean-2d-sfc_salt_flux_restore-1-monthly-_1958_01_16.nc
ocean-2d-sfc_salt_flux_restore-1-monthly-_1958_02_15.nc
ocean-2d-sfc_salt_flux_restore-1-monthly-_1958_03_16.nc
ocean-2d-tau_x-1-monthly-_1958_01_16.nc
ocean-2d-tau_x-1-monthly-_1958_02_15.nc
ocean-2d-tau_x-1-monthly-_1958_03_16.nc
ocean-2d-tau_y-1-monthly-_1958_01_16.nc
ocean-2d-tau_y-1-monthly-_1958_02_15.nc
ocean-2d-tau_y-1-monthly-_1958_03_16.nc
ocean-2d-tx_trans_int_z-1-monthly-_1958_01_16.nc
ocean-2d-tx_trans_int_z-1-monthly-_1958_02_15.nc
ocean-2d-tx_trans_int_z-1-monthly-_1958_03_16.nc
ocean-2d-ty_trans_int_z-1-monthly-_1958_01_16.nc
ocean-2d-ty_trans_int_z-1-monthly-_1958_02_15.nc
ocean-2d-ty_trans_int_z-1-monthly-_1958_03_16.nc
ocean-3d-age_global-1-monthly-_1958_01_16.nc
ocean-3d-age_global-1-monthly-_1958_02_15.nc
ocean-3d-age_global-1-monthly-_1958_03_16.nc
ocean-3d-diff_cbt_t-1-monthly-_1958_01_16.nc
ocean-3d-diff_cbt_t-1-monthly-_1958_02_15.nc
ocean-3d-diff_cbt_t-1-monthly-_1958_03_16.nc
ocean-3d-dzt-1-monthly-_1958_01_16.nc
ocean-3d-dzt-1-monthly-_1958_02_15.nc
ocean-3d-dzt-1-monthly-_1958_03_16.nc
ocean-3d-pot_rho_0-1-monthly-_1958_01_16.nc
ocean-3d-pot_rho_0-1-monthly-_1958_02_15.nc
ocean-3d-pot_rho_0-1-monthly-_1958_03_16.nc
ocean-3d-pot_rho_2-1-monthly-_1958_01_16.nc
ocean-3d-pot_rho_2-1-monthly-_1958_02_15.nc
ocean-3d-pot_rho_2-1-monthly-_1958_03_16.nc
ocean-3d-pot_temp-1-monthly-_1958_01_16.nc
ocean-3d-pot_temp-1-monthly-_1958_02_15.nc
ocean-3d-pot_temp-1-monthly-_1958_03_16.nc
ocean-3d-salt-1-monthly-_1958_01_16.nc
ocean-3d-salt-1-monthly-_1958_02_15.nc
ocean-3d-salt-1-monthly-_1958_03_16.nc
ocean-3d-temp-1-monthly-_1958_01_16.nc
ocean-3d-temp-1-monthly-_1958_02_15.nc
ocean-3d-temp-1-monthly-_1958_03_16.nc
ocean-3d-temp_xflux_adv-1-monthly-_1958_01_16.nc
ocean-3d-temp_xflux_adv-1-monthly-_1958_02_15.nc
ocean-3d-temp_xflux_adv-1-monthly-_1958_03_16.nc
ocean-3d-temp_yflux_adv-1-monthly-_1958_01_16.nc
ocean-3d-temp_yflux_adv-1-monthly-_1958_02_15.nc
ocean-3d-temp_yflux_adv-1-monthly-_1958_03_16.nc
ocean-3d-tx_trans-1-monthly-_1958_01_16.nc
ocean-3d-tx_trans-1-monthly-_1958_02_15.nc
ocean-3d-tx_trans-1-monthly-_1958_03_16.nc
ocean-3d-tx_trans_rho-1-monthly-_1958_01_16.nc
ocean-3d-tx_trans_rho-1-monthly-_1958_02_15.nc
ocean-3d-tx_trans_rho-1-monthly-_1958_03_16.nc
ocean-3d-ty_trans-1-monthly-_1958_01_16.nc
ocean-3d-ty_trans-1-monthly-_1958_02_15.nc
ocean-3d-ty_trans-1-monthly-_1958_03_16.nc
ocean-3d-ty_trans_rho-1-monthly-_1958_01_16.nc
ocean-3d-ty_trans_rho-1-monthly-_1958_02_15.nc
ocean-3d-ty_trans_rho-1-monthly-_1958_03_16.nc
ocean-3d-ty_trans_rho_gm-1-monthly-_1958_01_16.nc
ocean-3d-ty_trans_rho_gm-1-monthly-_1958_02_15.nc
ocean-3d-ty_trans_rho_gm-1-monthly-_1958_03_16.nc
ocean-3d-u-1-monthly-_1958_01_16.nc
ocean-3d-u-1-monthly-_1958_02_15.nc
ocean-3d-u-1-monthly-_1958_03_16.nc
ocean-3d-v-1-monthly-_1958_01_16.nc
ocean-3d-v-1-monthly-_1958_02_15.nc
ocean-3d-v-1-monthly-_1958_03_16.nc
ocean-3d-wt-1-monthly-_1958_01_16.nc
ocean-3d-wt-1-monthly-_1958_02_15.nc
ocean-3d-wt-1-monthly-_1958_03_16.nc
ocean-scalar-_1958_01_16.nc
ocean-scalar-_1958_02_15.nc
ocean-scalar-_1958_03_16.nc

@aidanheerdegen
Copy link
Contributor

-_ is pretty horrible.

You could add a prefix to the date field. date_1958_01_16 is quite long, so if there was something else that would be good.

@russfiedler
Copy link

That first form looks atrocious.

@aekiss
Copy link
Contributor Author

aekiss commented May 22, 2020

maybe ymd_1958_01_16 to specify the date order explicitly? Only one char shorter...

@aidanheerdegen
Copy link
Contributor

Yeah all I could come up with was dd. ymd at least has the advantage of some informational value.

@russfiedler
Copy link

I think judicious use of sed and mv postprocessing can get rid of the offending -_ abomination...

@aekiss
Copy link
Contributor Author

aekiss commented May 22, 2020

Also we may be unable to put anything after the date part of the filename.

"ocean-3d-temp-1-monthly-%4yr%2mo%2dy-snap", 1, "months", 1, "days", "time", 1, "months"
"ocean_model", "temp", "temp", "ocean-3d-temp-1-monthly-%4yr%2mo%2dy-snap", "all", "none", "none", 2

produces files like
ocean-3d-temp-1-monthly-_1958_05_01.nc
rather than the expected
ocean-3d-temp-1-monthly-_1958_05_01-snap.nc

So we may need to include the reduction method before the date, ie
ocean-3d-temp-1-monthly-snap-_1958_05_01.nc
(in which case we should not hide average so that the number of "fields" is consistent)

@aekiss
Copy link
Contributor Author

aekiss commented May 22, 2020

We could kill 2 birds with 1 stone my omitting the dash between reduction method and date, e.g.

ocean-3d-temp-1-monthly-average_1958_05_01.nc
ocean-3d-temp-1-monthly-snap_1958_05_01.nc

ie consider the reduction method to be part of the date "field"

@aidanheerdegen
Copy link
Contributor

So you could have the reduction method and date in the same field separated by an underscore ... or you could have all the date related stuff in a single field, like 1_monthly_snap_1958_05_01.

It is all completely arbitrary

@aidanheerdegen
Copy link
Contributor

aidanheerdegen commented May 22, 2020

snap

@aidanheerdegen
Copy link
Contributor

I prefer the latter, including all the date stuff in a single field. Seems consistent and neater.

@aekiss
Copy link
Contributor Author

aekiss commented May 22, 2020

These approaches don't work for the scalar files which lack a lot of the date stuff, so I think the ymd approach is best.

Examples:

1 file per field for 2d and 3d

ocean-2d-mld-1-monthly-mean-ymd_1958_10_16.nc
ocean-2d-mld-1-monthly-mean-ymd_1958_11_16.nc
ocean-2d-mld-1-monthly-mean-ymd_1958_12_16.nc
ocean-3d-temp-1-monthly-mean-ymd_1958_10_16.nc
ocean-3d-temp-1-monthly-mean-ymd_1958_11_16.nc
ocean-3d-temp-1-monthly-mean-ymd_1958_12_16.nc
ocean-3d-temp-1-monthly-snap-ymd_1958_11_01.nc  # NB: snap
ocean-3d-temp-1-monthly-snap-ymd_1958_12_01.nc
ocean-3d-temp-1-monthly-snap-ymd_1959_01_01.nc

all scalars in one file: edit: see below

ocean-scalar-ymd_1958_10_16.nc
ocean-scalar-ymd_1958_11_16.nc
ocean-scalar-ymd_1958_12_16.nc

static grid data in one file per field, with no date info:

ocean-2d-geolat_c.nc
ocean-2d-geolat_t.nc
ocean-2d-geolon_c.nc
ocean-2d-geolon_t.nc

@russfiedler
Copy link

Having the day in those monthly files seems completely unintuitive, unnecessary and ugly to me. You've got a 16 there for every file except Feb and that doesn't change in leap years so it serves no purpose.

@aekiss
Copy link
Contributor Author

aekiss commented May 25, 2020

I agree - I was thinking the same thing.

For snapshots that would mean the month will be at the end of the sampling period (ie the month after the sampling period), and other reduction methods will be in the middle (rounded down).

so monthly sampling over 3 months (Jan-March) looks like this

ocean-3d-temp-1-monthly-mean-ymd_1959_01.nc
ocean-3d-temp-1-monthly-mean-ymd_1959_02.nc
ocean-3d-temp-1-monthly-mean-ymd_1959_03.nc
ocean-3d-temp-1-monthly-snap-ymd_1959_02.nc  # snap offset by 1
ocean-3d-temp-1-monthly-snap-ymd_1959_03.nc
ocean-3d-temp-1-monthly-snap-ymd_1959_04.nc  # after the final month (March)

and 3-monthly is like this

ocean-3d-temp-3-monthly-mean-ymd_1959_02.nc  # in middle month
ocean-3d-temp-3-monthly-snap-ymd_1959_04.nc  # after final month

I guess that's not too confusing.

@aidanheerdegen
Copy link
Contributor

At the risk of pedantry, should it be ym for the monthly ones?

@russfiedler
Copy link

russfiedler commented May 25, 2020

Yes, Maybe, Deprecated.

I think the starting month is preferable for multimonth files since you have a far simpler relationship with the beginning and end of the time period. Besides, what if you have a 2 or 4 month file?

Edit: Ah, this is for a 3 monthly mean not the individual months. The middle does make a lot of sense in that case as it coincides with the time in the file.

@aekiss
Copy link
Contributor Author

aekiss commented May 25, 2020

@aidanheerdegen agreed - 1 char saved!

@russfiedler this is just the standard behaviour of MOM with %4yr%2mo and both output_freq and new_file_freq = 3 months

@russfiedler
Copy link

@aekiss Yes, before my edit I was thinking your example was the case output_freq=1 and new_file_freq=3.

@aekiss
Copy link
Contributor Author

aekiss commented May 25, 2020

with output_freq=1 and new_file_freq=3 we'd get

ocean-3d-temp-1-monthly-mean-ym_1959_01.nc
ocean-3d-temp-1-monthly-snap-ym_1959_02.nc

@aekiss
Copy link
Contributor Author

aekiss commented May 25, 2020

I think shared scalars file should also include the output frequency (as it's a per-file setting), but omit the reduction method (as it's per-field), eg

ocean-scalar-1-monthly-ym_1959_01.nc

@aekiss
Copy link
Contributor Author

aekiss commented Jun 10, 2020

closing - this has now been implemented in the ak-dev branch in all 6 configurations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants