Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpolate irfs #141

Merged
merged 44 commits into from
Apr 26, 2021
Merged

Interpolate irfs #141

merged 44 commits into from
Apr 26, 2021

Conversation

jsitarek
Copy link
Contributor

this is PR #125 done this time from the branch of the main repository rather than from a fork

the main changes are:

  • functions to interpolate effective area and migration matrix
  • functions to read IRFs (symmetric to the create_... functions)

the code has all the requests from PR #125 implemented already so please let me know if you have any further comments or if we can merge them.

one note, the reading IRF functions are made on purpose such that they can read either a single file or a list of files. this does not complicate those functions to much (it is done by a common auxiliary function) and it is very useful, because in a real situations (either for interpolation, or for averaging of IRFs) we will be very often reading in a bunch of IRF files, and instead of writing a separate function for each type of IRF to deal with arrays it is good to have such universal tool

jsitarek and others added 29 commits February 24, 2021 19:08
from the JSON config file the code  gets the parameters according to which perform the interpolation,
as well as the files with IRFs and their corresponding parameter values
then it reads in a data file and checks the value of those parameters
in the end it peforms the interpolation
…olation.py

- added unit tests for two of the functions (one is disabled for the moment)
- added writing of the output FITS file with interpolated Aeff
now the code automatically extracts the unit of each read quantity
added a unit test (disabled for the moment) to test this new function
…ific anymore, now the key where the interpolation parameters are stored has to be specified. Updated also the unit test and the example macro of generating the interpolated IRFs.
…the data from the bulk of pyirf, it is moved to the examples file
….table.Table

removed read_unit_from_HDUL function and its unit test since it is unnecessary now
…ved also the corresponding unit test.

included the (before missing) test data for this test. Since the test files are small they are added directly to the repository
the function that was reading the lo and hi bins and previosly also joining them into a single array now only reads the bins
added functions (in binning.py) to split or join the bins between the lo/hi scheme (used in FITS files) and single arrays (used in IRF computing functions)
added unit tests for all the above functions
…t dispersion matrixes to have the same sequence of variables as in the create/read dispersion matrix functions (before it was using the sequence as read from the fits file)
…functions

changed read_fits_bins_lo_hi that it returns the 0-th row of the table read from the file
changed read_fits_bins_lo_hi to be able to read from a list of files at the same time
…er of parameters

used the binning functions to simplify the generation of energy and theta tables in create_aeff2d_hdu function
@jsitarek
Copy link
Contributor Author

jsitarek commented Mar 24, 2021

the status: I answered all the comments of @maxnoe most of them are implemented, for one I asked for clarification, and on the question of reading tables of IRFs I'm not sure if we have converged
there is also still the issue with failing tests because of seemingly missing file (that is committed to repository, yet it is not visible by a test), I would appreciate some hint about this problem

Copy link
Member

@HealthyPear HealthyPear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI fails because test files are not supposed to be part of the repository files, in fact we are actively ignoring any data file with extensions like *.fits.gz or *.hdf5 in our .gitignore file.

I think we should keep test files all together (namely together with the ED/FACT comparisons)

@codecov
Copy link

codecov bot commented Apr 23, 2021

Codecov Report

Merging #141 (d9568eb) into master (650c801) will increase coverage by 1.02%.
The diff coverage is 96.95%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #141      +/-   ##
==========================================
+ Coverage   86.86%   87.89%   +1.02%     
==========================================
  Files          39       41       +2     
  Lines        1348     1503     +155     
==========================================
+ Hits         1171     1321     +150     
- Misses        177      182       +5     
Impacted Files Coverage Δ
pyirf/interpolation.py 81.81% <81.81%> (ø)
pyirf/binning.py 96.20% <90.00%> (-0.90%) ⬇️
pyirf/cuts.py 94.33% <100.00%> (+0.58%) ⬆️
pyirf/io/gadf.py 100.00% <100.00%> (ø)
pyirf/tests/test_binning.py 100.00% <100.00%> (ø)
pyirf/tests/test_cuts.py 98.55% <100.00%> (+0.27%) ⬆️
pyirf/tests/test_interpolation.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 650c801...d9568eb. Read the comment docs.

@jsitarek
Copy link
Contributor Author

Hi @maxnoe @HealthyPear
Discussing with Max I finally divided the PR into interpolation, and the rest of the new code (the new io functions) I will put into a separate branch/PR so we can continue the discussion there.
So, can we merge now?

Copy link
Member

@maxnoe maxnoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good, thanks.

I have one nitpick over the location / behaviour of the compare_irf_cuts function and a question concerning the effective area interpolation.

pyirf/io/gadf.py Outdated Show resolved Hide resolved
pyirf/tests/test_interpolation.py Outdated Show resolved Hide resolved

# remove zeros and log it
effective_area[effective_area < min_effective_area] = min_effective_area
effective_area = np.log(effective_area)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we interpolate effective area in logspace? The only axis over which effective area is expected to vary over multiple orders of magnitude is energy, but thats not the interpolation axis.

Wouldn't normal linear interpolation make more sense for different values of e.g. azimuth / zenith?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason is mostly the threshold effects - when you interpolate the energy bins close to the threshold, the zenith angle shifts the whole curve in energy scale and this means that the difference in zenith can produce a large difference in collection area and thus it is safer to interpolate such points in log space.

Copy link
Member

@maxnoe maxnoe Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's merge it like this and maybe we need to make this optional and check on simulations which interpolation scheme performs better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think interpolation in the logA logE space should work well at all energies, not just at threshold, so for simplicity I would keep (at least as default) the log-log approach everywhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But energy is not used in the interpolation at all. The interpolation happens for all energy bins independently.

And then it boils down, what kind of behaviour you expect of effective area vs. the changing (interpolation) variable.
That would just needed to be checked for the specific interpolation one is doing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, the interpolation in energy is done by gammapy.
I think Julian's point that logA is also better for interpolation in zenith is probably true, but there is not other way of knowing that testing it.

… returns false rather than raising exception when cuts differ
@maxnoe maxnoe requested a review from HealthyPear April 26, 2021 07:46
@moralejo
Copy link
Contributor

moralejo commented Nov 3, 2021

Hi, I think that besides interpolation we definitely need extrapolation, because one cannot build a grid that covers all possible cases inside the grid (and much less in an efficient way).

I see the scipy function used has no extrapolation option (even though it would be trivial to add it). Can you explore the possibility of using instead the functions in https://matplotlib.org/stable/api/tri_api.html? One can get the parameters of the plane defined by the 3 grid points closest to the given target direction, and then evaluate the plane in that point...

@maxnoe
Copy link
Member

maxnoe commented Nov 3, 2021

I think this comment is better in the open issue about changes to the interpolation than in this closed PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request input/output Format and file extensions of the input/output data.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants