Skip to content

implement average and median support for bkg_spectrum #253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 28, 2025

Conversation

gibsongreen
Copy link
Contributor

This pull request is created to add functionality to the bkg_spectrum function. Currently, this function utilizes np.nansum for all computations of the background spectrum. This pull request defaults this function with np.nansum, while also adding support for 'Average' and 'Median' by utilizing np.nanmean and np.nanmedian.

bkg_statistic : str, optional
Statistical method used to collapse the background image.
Supported values are:
- `'median'` : Uses the median (`np.nanmedian`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ref linking needs full name of the package. Please fix accordingly to get rid of warnings on ReadTheDocs. Thanks!

Suggested change
- `'median'` : Uses the median (`np.nanmedian`).
- `'median'` : Uses the median (`numpy.nanmedian`).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if passing in the function directly is more straightforward than using string for translation. It certainly would be more flexible and not confined to numpy. User can pass in their custom stats func... but is that too dangerous?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the implementation in Jdaviz (passing the traitlet value for Statistic) but can see the applicability to be able to pass the function directly too. Could modify the jdaviz logic to make this possible so I'm open to either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the API here should be dictated by what Jdaviz does. The API we pick should be useful to someone who is using just specreduce natively. Jdaviz can hook into it downstream no matter what.

Copy link
Contributor Author

@gibsongreen gibsongreen Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the other changes locally, I'll leave the string in for now but am open to modifying the logic here to make it more flexible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see what specreduce maintainers say. Thanks!

Copy link
Contributor

@tepickering tepickering Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like the idea of passing in a function for full generality, but also recognize that in this context, sum, average, and median are the most applicable options in the the vast majority of cases. it's also highly likely that someone would naively pass in, say, np.median and then wonder why their data becomes all np.nan. also, should we use "mean" instead of "average"?

Copy link

codecov bot commented Feb 27, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.93%. Comparing base (0ea84c9) to head (8d8caf4).
Report is 22 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #253      +/-   ##
==========================================
+ Coverage   85.84%   85.93%   +0.08%     
==========================================
  Files          13       13              
  Lines        1159     1166       +7     
==========================================
+ Hits          995     1002       +7     
  Misses        164      164              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

statistic_function = np.nansum

if bkg_statistic:
bkg_statistic = bkg_statistic.lower()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to support mixed case here. I say we only support inputs in all lowercase. Python is naturally case-sensitive.

Suggested change
bkg_statistic = bkg_statistic.lower()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used lower here for the Jdaviz traitlet which has the first letter capitalized in the string, that could be done in Jdaviz instead, so I'm good with removing this and moving the logic elsewhere (or like you said passing the function directly)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cshanahan1 I had that originally but removed it with this is mind

Copy link
Member

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also want to add a test for invalid input and ensure ValueError is raised. Thanks!

@gibsongreen
Copy link
Contributor Author

Might also want to add a test for invalid input and ensure ValueError is raised. Thanks!

Will do, thank you for the thorough review, it was very helpful!

Copy link
Contributor

@cshanahan1 cshanahan1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks! I added a couple of minor suggestions.

@@ -351,12 +358,21 @@ def bkg_spectrum(self, image=None):
"""
bkg_image = self.bkg_image(image)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a bkg_statistic = bkg_statistic.lower() to avoid potential frustration?

Copy link
Contributor

@tepickering tepickering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this. i am approving this, but want to note that "average" and "median" return a different scale than "sum". i.e., they're normalized to the area of the background region and thus easily scaled to a source region for subtraction. "sum" is not scaled and the number of background pixels used to calculate it would be required to use it for background subtraction. tracking number of pixels excluded by np.nansum is tricky and possibly complicated. i'm open to keeping "sum" as an option, but i'm unclear on the use case.

@gibsongreen
Copy link
Contributor Author

thanks for this. i am approving this, but want to note that "average" and "median" return a different scale than "sum". i.e., they're normalized to the area of the background region and thus easily scaled to a source region for subtraction. "sum" is not scaled and the number of background pixels used to calculate it would be required to use it for background subtraction. tracking number of pixels excluded by np.nansum is tricky and possibly complicated. i'm open to keeping "sum" as an option, but i'm unclear on the use case.

Thank you for reviewing! This function solely used sum prior to this PR, which was my reasoning in keeping it in and leaving it as the default parameter. I had considered removing sum altogether, however, there is test coverage for sum that results in several test failures and I didn't want to break existing functionality.

I'm open to making any additional changes necessary in this PR or a follow-up effort, whether that is setting the default parameter for the statistical function to be either mean/median or remove the option for sum.

@tepickering
Copy link
Contributor

I'm open to making any additional changes necessary in this PR or a follow-up effort, whether that is setting the default parameter for the statistical function to be either mean/median or remove the option for sum.

yes, i think it's a larger issue for discussion so it's fine to leave it for now and do a follow-up effort later. i would want to revisit why "sum" was used in the first place before nixing it.

@cshanahan1 cshanahan1 merged commit c31a9af into astropy:main Feb 28, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants