Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate using parallel IO #34

Closed
aekiss opened this issue May 17, 2019 · 63 comments
Closed

Investigate using parallel IO #34

aekiss opened this issue May 17, 2019 · 63 comments
Assignees

Comments

@aekiss
Copy link
Contributor

aekiss commented May 17, 2019

It may be worth trying to compile with parallel IO using PIO (setenv IO_TYPE pio).

We currently compile CICE with serial IO (setenv IO_TYPE netcdf in bld/build.sh), so one CPU does all the IO and we end up with an Amdahl's law situation that limits the scalability with large core counts.

At 0.1 deg CICE is IO-bound when doing daily outputs (see Timer 12 in ice_diag.d), and the time spent in CICE IO accounts for almost all the time MOM waits for CICE (oasis_recv in access-om2.out) so the whole coupled model is waiting on one cpu. With daily CICE output at 0.1deg this is ~19% of the model runtime (it's only ~2% without daily CICE output). Lowering the compression level to 1 (#33) has helped (MOM wait was 23% with level 5), and omitting static field output (#32) would also help.

Also I understand that PIO doesn't support compression - is that correct?

@russfiedler had these comments on Slack:

I have a feeling that the CICE parallel IO hadn't really been tested or there was some problem with it.
We would have to update the netcdf versions being used in CICE for a start.
the distributors of PIO note that they need to use netCDF 4.6.1 and HDF5 1.10.4 or later for their latest version. There's a bug in parallel collective IO in earlier hdf5 versions. The NCI version of netCDF 4.6.1 is built with hdf5 1.10.2! marshall noted above that Rui found a performance drop off when moving from 1.10.2 to 1.10.4.
the gather is done on all the small tiles. So you have each PE sending a single horizontal slab several times to the root PE for each level.
the number of MPI calls is probably the main issue. It looks like there's an individual send/recv for each tile rather than either a bulk send of the tiles or something more funky using MPI_Gather(v) and MPI_Type_create_subarray.

Slack discussion: https://arccss.slack.com/archives/C9Q7Y1400/p1557272377089800

@nichannah
Copy link
Contributor

nichannah commented Aug 5, 2019

I've been looking at the CICE PIO code. It is not as complete as the serial netcdf code, for example it doesn't do error proper checking. The PIO code still exists and is documented in CICE6.

My next step is to see whether it can be built on raijin.

Another option, which may be better even if PIO works is to take the MOM5 approach and have each PE output to it's own file followed by an offline collate. The advantage of this would be that we can continue to use the existing netcdf code (with slight modifications). The down-side would be that we need to write a collate program.

@russfiedler
Copy link
Contributor

@nichannah I think with the moves by Ed Hartnett wrt implementing PIO in FMS I think it would be best to go the PIO route to stay reasonably compatible with future FMS and CICEn.

@nichannah
Copy link
Contributor

nichannah commented Aug 12, 2019

Steps to build PIO cice.

  1. download and extract pio:
wget https://github.com/NCAR/ParallelIO/releases/download/pio2_4_4/pio-2.4.4.tar.gz
tar zxvf pio-2.4.4.tar.gz
  1. load necessary modules:
module load intel-cc/2018.3.222
module load intel-fc/2018.3.222
module load netcdf/4.6.1p
module load openmpi/4.0.1

I also tried openmpi/1.10.2 but the build failed with link errors.

  1. set environment variables
export CPPFLAGS='-std=c99 -I${NETCDF}/include/ -L${PARALLEL_NETCDF_BASE}/include/'
export LDFLAGS='-L${NETCDF}/lib/ -L${PARALLEL_NETCDF_BASE}/lib/'
  1. configure and make
./configure --enable-fortran --prefix=/short/x77/nah599/access-om2/src/cice5/pio-2.4.4/usr
make
make install

@nichannah
Copy link
Contributor

nichannah commented Aug 12, 2019

I looks like the CICE PIO code makes use of something called shr_pio_mod. Getting compile error like:

ice_pio.f90(9): error #7002: Error in opening the compiled module file.  Check INCLUDE paths.   [SHR_SYS_MOD]
  use shr_sys_mod , only: shr_sys_flush
------^
ice_pio.f90(7): error #7002: Error in opening the compiled module file.  Check INCLUDE paths.   [SHR_KIND_MOD]
  use shr_kind_mod, only: r8 => shr_kind_r8, in=>shr_kind_in
------^
ice_pio.f90(47): error #7002: Error in opening the compiled module file.  Check INCLUDE paths.   [SHR_PIO_MOD]
   use shr_pio_mod, only: shr_pio_getiosys, shr_pio_getiotype
-------^

The code can be found here:

https://github.com/CESM-Development/cesm-git-experimental/tree/master/cesm/models/csm_share

@aekiss
Copy link
Contributor Author

aekiss commented Sep 13, 2019

Netcdf 4.7.1 is now installed on raijin on top of hdf5/1.10.5. The parallel version, 4.7.1p (and hdf5/1.10.5p), is built with openmpi/4.0.1.

@nichannah
Copy link
Contributor

nichannah commented Sep 20, 2019

Followed my instructions as above with new versions and the configure step hangs. This seems to be caused by:

[nah599@raijin5 pio-2.4.4]$ module load intel-cc/17.0.1.132
[nah599@raijin5 pio-2.4.4]$ /bin/bash ./config.guess

The following works:

[nah599@raijin5 pio-2.4.4]$ module load intel-cc
[nah599@raijin5 pio-2.4.4]$ /bin/bash ./config.guess

This is the hanging command:

/apps/intel-ct/2019.3.199/cc/bin/icc -E /short/x77/nah599/tmp/cgm21joj/dummy.c

For the time being using old compiler versions to try to get things working.

nichannah added a commit that referenced this issue Sep 23, 2019
@nichannah
Copy link
Contributor

nichannah commented Sep 23, 2019

Current status is that PIO is building, need to modify CICE PIO support so that it works without CESM dependencies. The main difficulty here is that the CICE PIO code assumes that initialisation has already been done somewhere else (perhaps as part of a coupled model). So proper PIO initialisation needs to be written.

@nichannah
Copy link
Contributor

The PIO code is ready to be tested however there is a problem with netcdf, compiler and openmpi version compatibility between the new CICE and the rest of the model. So this issue is now dependent on upgrading these things.

@aekiss
Copy link
Contributor Author

aekiss commented Mar 31, 2020

In ACCESS-OM2 sea ice concentration is passed to MOM via OASIS
https://github.com/COSIMA/01deg_jra55_iaf/blob/30df8f5fd6404aeb459ff44298936df576dfbbf0/namcouple#L295
so we could output that field in parallel via MOM.

I couldn't find a relevant diagnostic here
https://github.com/COSIMA/access-om2/wiki/Technical-documentation#MOM5-diagnostics-list
so it looks like we'd need to write one.

@russfiedler
Copy link
Contributor

I've put this in the WOMBAT version but I've been holding off on issueing a pull request until @nichannah updates the way he proposes to pass new fields.

https://github.com/russfiedler/MOM5/blob/wombat/src/mom5/ocean_core/ocean_sbc.F90#L5971

@russfiedler
Copy link
Contributor

Also as a note to above. netCDF on gadi should be suitable for PIO

@nichannah
Copy link
Contributor

nichannah commented Apr 27, 2020

Updated PIO build instructions:

cd $ACCESS_OM_DIR/src/cice5
wget https://github.com/NCAR/ParallelIO/releases/download/pio2_5_0/pio-2.5.0.tar.gz
tar zxvf pio-2.5.0.tar.gz
cd pio-2.5.0
module load intel-compiler/2019.5.281
module load netcdf/4.7.4p
module load openmpi/4.0.2
export CC=mpicc
export FC=mpifort
./configure --enable-fortran --disable-pnetcdf --enable-logging --enable-netcdf-integration --prefix=$ACCESS_OM_DIR/src/cice5/pio-2.5.0/usr
make
make install

Note that logging is enabled above. This will need to be changed in production.

To build using Cmake:

CC=mpicc FC=mpif90 cmake -DWITH_PNETCDF=OFF -DNetCDF_C_LIBRARY="${NETCDF}/lib/ompi3/libnetcdf.so" -DNetCDF_C_INCLUDE_DIR="${NETCDF}/include/" -DNetCDF_Fortran_LIBRARY="${NETCDF}/lib/ompi3/Intel" -DNetCDF_Fortran_INCLUDE_DIR="${NETCDF}/include/Intel" ../

nichannah added a commit to COSIMA/1deg_jra55_iaf that referenced this issue Apr 29, 2020
@nichannah
Copy link
Contributor

nichannah commented May 2, 2020

preliminary results from a 10 day 0.1 run with daily cice output. previously writing output was 15% of CICE runtime, it’s now 6%.

mom now spending less than half as much time waiting on ice. from 12% or runtime down to 5%

the interesting thing now is to see how this scales. Presumably the existing approach will not scale well as we increase the number of CICE cpus. It would be good to see whether we can increase the number of CICE cpus to further reduce the MOM wait time. Aim to get this below 1%

@aekiss
Copy link
Contributor Author

aekiss commented May 3, 2020

Thanks @nichannah, that's great news.

Did you run your test with 799 CICE cores? And am I right in thinking CICE with PIO uses all cores (rather than a subset like MOM io_layout)? If so, I'm a little surprised it didn't speed up more, if there are 799x more cores doing the output. I guess there's some extra overhead in PIO?

@marshallward's tests on Raijin showed CICE would scale well up to about 2000 cores and is still reasonable at 3000 (see table below). If so, I guess we'd need over 4000 CICE cores to get below 1% MOM wait time, which seems rather a lot. But in our standard configs (serial CICE io, monthly outputs) MOM spends just under 2% of its time waiting for CICE, so 1% is better than we're used to.

Screen Shot 2020-05-03 at Sun 3-5 9 48am

@nichannah
Copy link
Contributor

Thanks @aekiss, that's useful.

I'm now running a test to see how a run with daily output compares to one with monthly output. If that is OK then perhaps we can start to use this feature before spending more time on optimisation.

@russfiedler
Copy link
Contributor

I believe PIO allows some sort of flexibilty with which PEs are used https://ncar.github.io/ParallelIO/group___p_i_o__init.html . I don't know how flexible this is in what has been written for CICE. There is an interesting point made in the FAQ that it's sometimes worth moving the IO away from the root PE/task (and I presume node) due to the heavier load there.
Would it be worth investigating striping the files?

@nichannah
Copy link
Contributor

nichannah commented May 4, 2020

Yes, it looks like there's some configuration optimisation that we can do with this. Presently I'm just using the simplest config which is a stride of 1 - so all procs are writing output.

I have just completed two 2 month runs:

  1. standard config with mostly monthly cice output (16Gb output over 2 months)
  2. PIO config with all daily output (460Gb output over 2 months)

Basically 1) is doing about 8Gb per month and 2) is doing 8Gb per day.

The runtime of these two runs is almost identical. Looking at ice_diag.d the time taken for writing out history is similar but the PIO case is about 5% slower. See

/scratch/v45/nah599/access-om2/archive/01deg_jra55_iaf/output000/ice/ice_diag.d
/scratch/v45/nah599/access-om2/archive/pio_daily_01deg_jra55_iaf/output000/ice/ice_diag.d

Incidentally, there seems to be something strange happening with the atm halo timers in the new PIO run. The mean time in the PIO run is 6 seconds but for the regular run it is 106 seconds. A possible explanation for this is that the PEs within CICE are better matched so collective operations don't have to wait as long on lagging PEs.

So this new feature should allow daily ice output with no performance penalty over the existing configuration. I think it makes sense to merge this into master. Any objections? @aekiss?

Future work will involve looking at the scaling and performance of the whole model in more detail and at that point I can look at the different configuration options of PIO if ice output is a bottleneck.

@aekiss
Copy link
Contributor Author

aekiss commented May 4, 2020

That's great that daily output can be done with nearly the same runtime. If you're confident that the output with PIO is bitwise identical to the non-PIO version then I see no reason not to merge into master, given that it makes daily output practical.
@AndyHoggANU any objections?

Also is compressed output still possible with PIO?

@aekiss
Copy link
Contributor Author

aekiss commented Sep 28, 2020

I agree that filling land with 0 seems the better option, rather than hoping we remember this gotcha into the indefinite future...

@nichannah
Copy link
Contributor

The solution to this is not completely satisfactory. The obvious way to get netcdf to put 0's in places where no data is written is to set _FillValue = 0. This can be a bit confusing because there is no difference between "no data" and "data with value 0". However I think this is probably still better than the alternative which is needing to fix-up CICE restarts whenever the PE layout changes.

See attached Tsfcn, the white has value 0 and the red is mostly -1.8.

Screen Shot 2020-10-08 at 10 49 09 pm

@aekiss
Copy link
Contributor Author

aekiss commented Oct 9, 2020

I don't think setting _FillValue = 0 is a problem for this particular variable, as zero values are used for Tsfcn at land points outside the land mask in the PIO restarts anyway.

e.g. see Tsfcn in /scratch/x77/aek156/access-om2/archive/01deg_jra55v140_iaf_cycle2_pio_test2/restart356/ice/iced.1986-04-01-00000.nc (the range is narrowed for clarity):
Screen Shot 2020-10-09 at Fri 9-10 4 36pm

Setting _FillValue = 0 is consistent with what was done in the non-PIO restarts, which had zero throughout the land points - e.g. /scratch/v14/pas548/restarts/KEEP/restart356/ice/iced.1986-04-01-00000.nc:
Screen Shot 2020-10-09 at Fri 9-10 4 29pm

However I haven't looked at the other restart fields or files so maybe there would be problems with them?

I guess a safe thing would be to set the _FillValue for each field to the value from a land point that isn't masked out?

@aekiss
Copy link
Contributor Author

aekiss commented Oct 9, 2020

For example the point (474, 2613) is land but unmasked so you could check its value for every field in every restart file in /scratch/v14/pas548/restarts/KEEP/restart356/ice/ or /scratch/x77/aek156/access-om2/archive/01deg_jra55v140_iaf_cycle2_pio_test2/restart356/ice/iced.1986-04-01-00000.nc and use this as the _FillValue

@aidanheerdegen
Copy link
Contributor

aidanheerdegen commented Oct 9, 2020

CF conventions allow for _FillValue and missing_value. Is missing_value is set to something that is non-zero does that help?

http://cfconventions.org/cf-conventions/cf-conventions.html#missing-data

@aekiss
Copy link
Contributor Author

aekiss commented Oct 19, 2020

Thanks @nichannah, I'm closing this issue now.

We decided in the 14 Oct TWG meeting that this issue with restarts is not significant enough to warrant fixing, and that a fix with a change to _FillValue=0 would cause more trouble than it was worth, since genuine data could be misinterpreted as fill.

We just need to remember to fill in the cpu-masked cells with zero values if restarting with a changed cpu layout.

I've done a test run at 0.1deg with PIO (using commit 7c74942) to compare to one without PIO (using commit 26e6159).
This restart issue means I can't compare the restart files, but I've confirmed (using xarray's identical method) that the outputs are identical, including for a second run based on PIO-generated restarts, so I'm confident that the model state is unaffected by these differences in the restart files. Test script is here: https://github.com/aekiss/notebooks/blob/72986342795e6fef167ad5d9df76a01b1ad7fefa/check_pio.ipynb

@aekiss aekiss closed this as completed Oct 19, 2020
@aekiss
Copy link
Contributor Author

aekiss commented Oct 27, 2020

Sorry @nic, I'm reopening again - I've hit a bug using PIO in a 1deg configuration.

For the 1deg config I'm using one core per chunk, laid out the same way as slenderX1 (not sure if this is the best choice?)

    history_chunksize_x = 15
    history_chunksize_y = 300

I have repeated identical runs

/home/156/aek156/payu/testing/all-configs/v2.0.0rc9/1deg_jra55_ryf_v2.0.0rc9
/home/156/aek156/payu/testing/all-configs/v2.0.0rc9/1deg_jra55_ryf_v2.0.0rc9xx

and got differing output in these files and variables:

/scratch/x77/aek156/1deg_jra55_ryf_v2.0.0rc9-CHUCKABLE/output000/ice/OUTPUT/iceh.1900-01.nc
/scratch/x77/aek156/1deg_jra55_ryf_v2.0.0rc9xx-CHUCKABLE/output000/ice/OUTPUT/iceh.1900-01.nc
fsurfn_ai_m
vicen_m

/scratch/x77/aek156/1deg_jra55_ryf_v2.0.0rc9-CHUCKABLE/output000/ice/OUTPUT/iceh.1900-02.nc
/scratch/x77/aek156/1deg_jra55_ryf_v2.0.0rc9xx-CHUCKABLE/output000/ice/OUTPUT/iceh.1900-02.nc
fmelttn_ai_m
vicen_m

/scratch/x77/aek156/1deg_jra55_ryf_v2.0.0rc9-CHUCKABLE/output001/ice/OUTPUT/iceh.1900-04.nc
/scratch/x77/aek156/1deg_jra55_ryf_v2.0.0rc9xx-CHUCKABLE/output001/ice/OUTPUT/iceh.1900-04.nc
aicen_m
flatn_ai_m
fmelttn_ai_m

/scratch/x77/aek156/1deg_jra55_ryf_v2.0.0rc9-CHUCKABLE/output001/ice/OUTPUT/iceh.1900-05.nc
/scratch/x77/aek156/1deg_jra55_ryf_v2.0.0rc9xx-CHUCKABLE/output001/ice/OUTPUT/iceh.1900-05.nc
flatn_ai_m
fsurfn_ai_m
vicen_m

/scratch/x77/aek156/1deg_jra55_ryf_v2.0.0rc9-CHUCKABLE/output002/ice/OUTPUT/iceh.1900-07.nc
/scratch/x77/aek156/1deg_jra55_ryf_v2.0.0rc9xx-CHUCKABLE/output002/ice/OUTPUT/iceh.1900-07.nc
fcondtopn_ai_m

Note that this issue only appears in multi-category variables (e.g. 'fcondtopn_ai_m' (time: 1, nc: 5, nj: 300, ni: 360)) and is unpredictable - most multi-category variables are ok most of the time, and there are no variables that are always affected.

For example here's category 0 of fmelttn_ai_m in
/scratch/x77/aek156/1deg_jra55_ryf_v2.0.0rc9xx-CHUCKABLE/output000/ice/OUTPUT/iceh.1900-02.nc
Screen Shot 2020-10-28 at Wed 28-10 9 49am
There are bad points just north of the Equator over a limited longitude range in the Indonesian archipelago. They are extremely large, presumably uninitialised values. The values in the longitudes between them are very small but nonzero (they should be zero). The land mask is also messed up.

The problem occurs in different places in other fields.

I've only seen this problem in category 0, but I haven't checked thoroughly.
e.g here's category 1 of the same field and file:
Screen Shot 2020-10-28 at Wed 28-10 9 50am

I didn't see this issue with the 0.1deg config. Maybe I need better choices for history_chunksize_x and history_chunksize_y? (NB I found I could get segfaults if I wasn't careful with these values...)

@aekiss aekiss reopened this Oct 27, 2020
@aekiss
Copy link
Contributor Author

aekiss commented Nov 3, 2020

Oops, apologies @nichannah - this was just because I was calling mpirun with the wrong options at 1 deg.

When I use

mpirun: --mca io ompio --mca io_ompio_num_aggregators 1

in config.yaml it works as expected.

@aekiss aekiss closed this as completed Nov 3, 2020
@aidanheerdegen
Copy link
Contributor

The OpenMPI docs say ompio is the default for versions > 2.x. Is that incorrect?

https://www.open-mpi.org/faq/?category=ompio

@nichannah
Copy link
Contributor

On Gadi it appears that romio is used by default. Also we need to specify the number of MPI aggregators explicitly to avoid the heuristic/algorithm that usually sets this. This algorithm appears to get confused with the combination of (chunksize != tile size) and deflation on. The confusion leads to a divide-by-zero. I haven't spent the time to really understand this bug/problem so you could say that --mca io_ompio_num_aggregators 1 is a work-around.

@aidanheerdegen
Copy link
Contributor

Thanks for the explanation @nichannah

@aekiss
Copy link
Contributor Author

aekiss commented Nov 5, 2020

@nichannah FYI: PIO seems to slow down CICE at 1 deg.
see 3-month runs in /home/156/aek156/payu/testing/all-configs/v2.0.0rc9

Fraction of MOM runtime in oasis_recv, Max CICE I/O time (s)
1 deg, no PIO 1deg_jra55_ryf_v2.0.0rc9_nopio: 0.04, 10.6
1 deg, PIO with 24 chunks (15x300) 1deg_jra55_ryf_v2.0.0rc9_pio: 0.062, 15.3
1 deg, PIO with 1 chunk (360x300) 1deg_jra55_ryf_v2.0.0rc9_pio_1chunk: 0.096, 24.4

but it is improved at 0.25deg:
0.25 deg, no PIO 025deg_jra55_ryf_v2.0.0rc9: 0.078, 54
0.25 deg, PIO with 100 chunks (144x108) 025deg_jra55_ryf_v2.0.0rc9_pio2: 0.04, 25

The cice cores are spread between nodes on gadi at 1deg with 1+216+24 cores for yatm/mom/cice so that might be part of the problem: COSIMA/access-om2#212 and COSIMA/access-om2#202

aekiss added a commit to COSIMA/01deg_jra55_iaf that referenced this issue Nov 5, 2020
…MOM5#317; CICE uses PIO: COSIMA/cice5#34); configuration changes to support PIO in CICE
aekiss added a commit to COSIMA/01deg_jra55_ryf that referenced this issue Nov 5, 2020
…MOM5#317; CICE uses PIO: COSIMA/cice5#34); configuration changes to support PIO in CICE
aekiss added a commit to COSIMA/025deg_jra55_iaf that referenced this issue Nov 5, 2020
…MOM5#317; CICE uses PIO: COSIMA/cice5#34); configuration changes to support PIO in CICE
aekiss added a commit to COSIMA/025deg_jra55_ryf that referenced this issue Nov 5, 2020
…MOM5#317; CICE uses PIO: COSIMA/cice5#34); configuration changes to support PIO in CICE
aekiss added a commit to COSIMA/1deg_jra55_iaf that referenced this issue Nov 5, 2020
…MOM5#317; CICE uses PIO: COSIMA/cice5#34); configuration changes to support PIO in CICE
@aekiss
Copy link
Contributor Author

aekiss commented Nov 6, 2020

I've also tried 1 deg (/home/156/aek156/payu/testing/all-configs/v2.0.0rc10/1deg_jra55_iaf_v2.0.0rc10) and 0.25 deg (025deg_jra55_iaf_v2.0.0rc10) configs with 4 chunks (90x300 at 1 deg; 720x540 at 0.25 deg) and get 0.085 for the fraction of MOM runtime in oasis_recv in both cases.

1 deg with 4 chunks is almost as fast as the 24-chunk case (though slower than without PIO) but should be faster to read in most circumstances than 24 chunks. However I'm thinking a 180x150 4-chunk layout is probably a better match to hemisphere-based access patterns so I might try that too. This run was for 5 years, rather than 3mo as in the previous and next posts so I haven't included Max CICE I/O time. It's a bit faster in a 3mo test - see next post.

0.25 deg with 4 chunks is now somewhat slower than without PIO but I'm reluctant to use too many chunks in case it slows down reading. Note that this run was for 2 years, rather than 3mo as in the previous and next posts.

Also I should have mentioned that these 1 deg and 0.25 deg tests all had identical ice outputs, but they differ from the ice outputs in the production 0.1deg runs I reported here so they aren't directly comparable to those.

@aekiss
Copy link
Contributor Author

aekiss commented Nov 12, 2020

Some more tests of differing history_chunksize_x x history_chunksize_y with 3mo runs at 1 deg in /home/156/aek156/payu/testing/all-configs/v2.0.0rc:

Fraction of MOM runtime in oasis_recv, Max CICE I/O time (s)
1 deg, PIO with 4 chunks (90x300) 1deg_jra55_iaf_v2.0.0rc10_3mo: 0.067, 16.8
1 deg, PIO with 4 chunks (180x150) 1deg_jra55_iaf_v2.0.0rc10_3mo_180x150: 0.072, 18.1

The first of these is slightly faster (presumably because it is consistent with the 15x300 core layout) but the difference is small and so I will use 180x150 for the new 1deg configs as this is better suited to typical access patterns of reading one hemisphere or the other.

The fraction of MOM runtime in oasis_recv values with 90x300 is smaller in the 3 mo case compared to 5yr: 0.067 rather than 0.085 (see prev post). So for 3mo runs the 4-chunk cases (0.067, 0.072) are nearly as fast as the 24-chunk case (0.062) and considerably faster than 1 chunk (0.096) - see post before last.

@aekiss
Copy link
Contributor Author

aekiss commented Nov 20, 2020

For future reference: the processor masking in the ice restarts can be fixed with https://github.com/COSIMA/topogtools/blob/master/fix_ice_restarts.py, allowing a change in processor layout during a run.

@access-hive-bot
Copy link

This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:

https://forum.access-hive.org.au/t/payu-generated-symlinks-dont-work-with-parallelio-library/1617/3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants