You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
desi_tile_redshifts currently assigns one node per spectrograph, but this runs out of memory on KNL for larger tiles, e.g. daily/tiles/cumulative/80738/20210406/spectra-4-80738-thru20210406.fits, which has 3.5 GB of spectra from 17 exposures:
RUNNING srun -N 1 -n 68 -c 4 rrdesi_mpi tiles/cumulative/80738/20210406/spectra-4-80738-thru20210406.fits -o tiles/cumulative/80738/20210406/redrock-4-80738-thru20210406.h5 -z tiles/cumulative/80738/20210406/zbest-4-80738-thru20210406.fits
Running with 68 processes
Loading targets...
slurmstepd: error: Detected 1 oom-kill event(s) in step 41435047.3 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: nid09911: task 21: Out Of Memory
srun: Terminating job step 41435047.3
slurmstepd: error: *** STEP 41435047.3 ON nid09911 CANCELLED AT 2021-04-07T13:11:49 ***
In this case, either srun -N 1 -n 34 -c 8 ... (17 min) or srun -N 2 -n 68 -c 4 ... (13m) works, but would require some pipeline logic to pre-identify when there are too many input frames and drop down to fewer cores.
The text was updated successfully, but these errors were encountered:
desi_tile_redshifts
currently assigns one node per spectrograph, but this runs out of memory on KNL for larger tiles, e.g.daily/tiles/cumulative/80738/20210406/spectra-4-80738-thru20210406.fits
, which has 3.5 GB of spectra from 17 exposures:In this case, either
srun -N 1 -n 34 -c 8 ...
(17 min) orsrun -N 2 -n 68 -c 4 ...
(13m) works, but would require some pipeline logic to pre-identify when there are too many input frames and drop down to fewer cores.The text was updated successfully, but these errors were encountered: