Skip to content

Conversation

@londumas
Copy link
Contributor

@londumas londumas commented Jan 10, 2019

This PR adds the possibility to compute the distortion matrix on a rectangular matrix and the metal matrix on a square matrix finer than the data resolution: fix issue #523.
This is done through the --coef-binning-model parameter 1 by default, i.e. a square: same dimension for model than for data.
This PR works and does the job, however we are limited by the memory of individual computers.
It has to be tested on NERSC, but it is possible that we can't use this rectangular matrix for the moment.

It might be fixed in the future version of Python (python/cpython#10305), but I tried on the current most up to date package and it fails.

Sending the following command produces the following error:

do_metal_xdmat.py
--in-dir $HOME/Run_programs/igmhub/picca_DR16_paper_analysis///Delta_LYA/Delta/
--drq $HOME/Data/Catalogs_for_MGII_studies/cat_QSO.fits
--out test_metal_xdmat.fits.gz
--z-evol-obj 1.44
--coef-binning-model 2
--rej 0.98
--nside 32
--abs-igm SiIII\(1207\)
--nspec 10
  File "<home>/Programs/igmhub/picca/bin/do_metal_xdmat.py", line 171, in <module>
    dm = pool.map(f,sorted(list(cpu_data.values())))
  File "/uufs/chpc.utah.edu/sys/installdir/python/3.6.3/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/uufs/chpc.utah.edu/sys/installdir/python/3.6.3/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[(array([  0.        ,   0.        ,   0.        , ..., 114.55063889,
       153.44393323, 109.26155323]), array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]]), array([    0.        ,     0.        ,     0.        , ...,
       22539.23003614, 30912.9905582 , 24193.36279715]), array([    0.        ,     0.        ,     0.        , ...,
       22104.94647345, 30624.99354644, 24202.25276482]), array([  0.        ,   0.        ,   0.        , ..., 258.83191457,
       362.24449508, 276.50408172]), array([  0.        ,   0.        ,   0.        , ..., 113.32391608,
       155.51729145, 121.69355972]), 61797, 1237)]'. Reason: 'error("'i' format requires -2147483648 <= number <= 2147483647",)'

@londumas
Copy link
Contributor Author

Sending on NERSC still does not work: out of memory.

srun --time 30 --qos debug do_xdmat.py --in-dir $SCRATCH/eBOSS/picca_DR16_paper_analysis///Delta_LYA/Delta/ --drq $HOME/Data/Catalogs_for_MGII_studies/cat_QSO.fits --out $SCRATCH/eBOSS/picca_DR16_paper_analysis/Tests/rectangle_distortion_matrix//Correlations_200hMpc_coef4/xdmat_z_0_10.fits.gz --z-evol-obj 1.44 --coef-binning-model 4 --rej 0.98 --nside 32 --nspec 10
slurmstepd: error: Detected 1 oom-kill event(s) in step 11983214.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: nid00101: task 0: Out Of Memory
srun: Terminating job step 11983214.0

@londumas
Copy link
Contributor Author

Another issue for space is at the fits level:

sending this

do_metal_dmat.py --in-dir $HOME/Run_programs/igmhub/picca_DR16_paper_analysis///Delta_LYA/Delta/ --out test_metal_dmat.fits.gz --coef-binning-model 2 --rej 0.98 --nside 32 --abs-igm SiII\(1193\) SiII\(1190\) SiIII\(1207\) SiII\(1260\) --nspec 10

produces an error when fitting

[<me>@eboss:cf_z_0_10]$ fitter2 chi2.ini 
INFO: reading input Pk PlanckDR12/PlanckDR12.fits
INFO: reading <HOME>/Run_programs/igmhub/picca_DR16_paper_analysis/Tests/rectangle_distortion_matrix/Correlations_200hMpc_coef2/Fit/cf_z_0_10//config.ini
Traceback (most recent call last):
  File "<HOME>/Programs/igmhub/picca/bin/fitter2", line 17, in <module>
    dic_init = parser.parse_chi2(args.config)
  File "<HOME>/Programs/igmhub/picca/py/picca/fitter2/parser.py", line 43, in parse_chi2
    dic_init['data sets']['data'] = [data.data(parse_data(os.path.expandvars(d),zeff,dic_init['fiducial'])) for d in cp.get('data sets','ini files').split()]
  File "<HOME>/Programs/igmhub/picca/py/picca/fitter2/parser.py", line 43, in <listcomp>
    dic_init['data sets']['data'] = [data.data(parse_data(os.path.expandvars(d),zeff,dic_init['fiducial'])) for d in cp.get('data sets','ini files').split()]
  File "<HOME>/Programs/igmhub/picca/py/picca/fitter2/data.py", line 218, in __init__
    self.rp_met[(self.tracer1['name'], m)] = hmet[2]["RP_{}_{}".format(self.tracer1['name'],m)][:]
  File "/uufs/chpc.utah.edu/sys/installdir/python/3.6.3/lib/python3.6/site-packages/fitsio/fitslib.py", line 3326, in __getitem__
    return self.read(rows=res)
  File "/uufs/chpc.utah.edu/sys/installdir/python/3.6.3/lib/python3.6/site-packages/fitsio/fitslib.py", line 3300, in read
    data = self.fitshdu.read_column(self.columns, **keys)
  File "/uufs/chpc.utah.edu/sys/installdir/python/3.6.3/lib/python3.6/site-packages/fitsio/fitslib.py", line 1889, in read_column
    res = self.read_columns([col], **keys)
  File "/uufs/chpc.utah.edu/sys/installdir/python/3.6.3/lib/python3.6/site-packages/fitsio/fitslib.py", line 2013, in read_columns
    self._FITS.read_columns_as_rec(self._ext+1, colnumsp, array, rows)
OSError: FITSIO status = 107: tried to move past end of file
``

@ngbusca
Copy link
Contributor

ngbusca commented Jan 11, 2019

@londumas the multiprocessing error looks like an int overflow. Did you try removing the multiprocessing to see if you get the same error message? As you know, it's very hard to debug under mp.

As for the memory error, if you try a single mp thread you can be sure that the dmat will fit in memory.

@ngbusca
Copy link
Contributor

ngbusca commented Jan 11, 2019

@londumas for the fits, it doesn't look like a memory error, it looks like a corrupt .fits file.

@londumas
Copy link
Contributor Author

@ngbusca,

  • for the issue with multiprocessing, it is indeed linked to multiprocessing and not any other aspect. A fix is not to use multiprocessing when nproc==1. This work, and I am committing it, it does not take that much more time.
  • for the issue of corrupted fits file, it is indeed linked to the size. I have made a ticket there, on the fitsio github page: fitsio can't write large arrays and doesn't raise an error esheldon/fitsio#199.

@londumas
Copy link
Contributor Author

@ngbusca, sorry. Did the comment in the wrong PR.

@vserret
Copy link
Contributor

vserret commented Jan 25, 2019

@londumas I don't understand why do you need to do that on metal matrix too. I thought that the problem concerned only the dmat and how to model the binning effect in the cf model.

@londumas
Copy link
Contributor Author

@vserret, thanks for having a look. The metal distortion matrix allows to go from the model of the metal-correlation to the model of the Lya-correlation. If the model of the Lya-correlation is finer, so the model of the metal-correlation has to be. See line

xi_met = dm_met.dot(xi_met)

@vserret
Copy link
Contributor

vserret commented Jan 28, 2019

@londumas did you manage to run it on NERSC ? Did you get the fit results ?

@londumas
Copy link
Contributor Author

londumas commented Feb 2, 2019

@vserret, Yes I run it in Utah and get very similar results, with and without a rectangle distortion matrix.

@londumas londumas merged commit c945f77 into master Feb 5, 2019
@londumas londumas deleted the rectangle_distortion_matrix branch February 5, 2019 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants