implement average and median support for bkg_spectrum #253

gibsongreen · 2025-02-27T03:42:37Z

This pull request is created to add functionality to the bkg_spectrum function. Currently, this function utilizes np.nansum for all computations of the background spectrum. This pull request defaults this function with np.nansum, while also adding support for 'Average' and 'Median' by utilizing np.nanmean and np.nanmedian.

specreduce/background.py

pllim · 2025-02-27T19:09:08Z

specreduce/background.py

+        bkg_statistic : str, optional
+            Statistical method used to collapse the background image.
+            Supported values are:
+            - `'median'` : Uses the median (`np.nanmedian`).


Ref linking needs full name of the package. Please fix accordingly to get rid of warnings on ReadTheDocs. Thanks!

Suggested change

- `'median'` : Uses the median (`np.nanmedian`).

- `'median'` : Uses the median (`numpy.nanmedian`).

I also wonder if passing in the function directly is more straightforward than using string for translation. It certainly would be more flexible and not confined to numpy. User can pass in their custom stats func... but is that too dangerous?

I was thinking about the implementation in Jdaviz (passing the traitlet value for Statistic) but can see the applicability to be able to pass the function directly too. Could modify the jdaviz logic to make this possible so I'm open to either way.

I don't think the API here should be dictated by what Jdaviz does. The API we pick should be useful to someone who is using just specreduce natively. Jdaviz can hook into it downstream no matter what.

I've made the other changes locally, I'll leave the string in for now but am open to modifying the logic here to make it more flexible

Let's see what specreduce maintainers say. Thanks!

i like the idea of passing in a function for full generality, but also recognize that in this context, sum, average, and median are the most applicable options in the the vast majority of cases. it's also highly likely that someone would naively pass in, say, np.median and then wonder why their data becomes all np.nan. also, should we use "mean" instead of "average"?

codecov · 2025-02-27T19:10:12Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.93%. Comparing base (0ea84c9) to head (8d8caf4).
Report is 22 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #253      +/-   ##
==========================================
+ Coverage   85.84%   85.93%   +0.08%     
==========================================
  Files          13       13              
  Lines        1159     1166       +7     
==========================================
+ Hits          995     1002       +7     
  Misses        164      164

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

specreduce/background.py

pllim · 2025-02-27T19:13:30Z

specreduce/background.py

+        statistic_function = np.nansum
+
+        if bkg_statistic:
+            bkg_statistic = bkg_statistic.lower()


I don't think we need to support mixed case here. I say we only support inputs in all lowercase. Python is naturally case-sensitive.

Suggested change

bkg_statistic = bkg_statistic.lower()

I used lower here for the Jdaviz traitlet which has the first letter capitalized in the string, that could be done in Jdaviz instead, so I'm good with removing this and moving the logic elsewhere (or like you said passing the function directly)

@cshanahan1 I had that originally but removed it with this is mind

specreduce/background.py

pllim

Might also want to add a test for invalid input and ensure ValueError is raised. Thanks!

specreduce/tests/test_background.py

gibsongreen · 2025-02-27T21:20:37Z

Might also want to add a test for invalid input and ensure ValueError is raised. Thanks!

Will do, thank you for the thorough review, it was very helpful!

cshanahan1

This looks good to me, thanks! I added a couple of minor suggestions.

cshanahan1 · 2025-02-27T21:55:58Z

specreduce/background.py

@@ -351,12 +358,21 @@ def bkg_spectrum(self, image=None):
        """
        bkg_image = self.bkg_image(image)



maybe add a bkg_statistic = bkg_statistic.lower() to avoid potential frustration?

specreduce/background.py

tepickering

thanks for this. i am approving this, but want to note that "average" and "median" return a different scale than "sum". i.e., they're normalized to the area of the background region and thus easily scaled to a source region for subtraction. "sum" is not scaled and the number of background pixels used to calculate it would be required to use it for background subtraction. tracking number of pixels excluded by np.nansum is tricky and possibly complicated. i'm open to keeping "sum" as an option, but i'm unclear on the use case.

gibsongreen · 2025-02-27T23:39:41Z

thanks for this. i am approving this, but want to note that "average" and "median" return a different scale than "sum". i.e., they're normalized to the area of the background region and thus easily scaled to a source region for subtraction. "sum" is not scaled and the number of background pixels used to calculate it would be required to use it for background subtraction. tracking number of pixels excluded by np.nansum is tricky and possibly complicated. i'm open to keeping "sum" as an option, but i'm unclear on the use case.

Thank you for reviewing! This function solely used sum prior to this PR, which was my reasoning in keeping it in and leaving it as the default parameter. I had considered removing sum altogether, however, there is test coverage for sum that results in several test failures and I didn't want to break existing functionality.

I'm open to making any additional changes necessary in this PR or a follow-up effort, whether that is setting the default parameter for the statistical function to be either mean/median or remove the option for sum.

tepickering · 2025-02-27T23:55:18Z

I'm open to making any additional changes necessary in this PR or a follow-up effort, whether that is setting the default parameter for the statistical function to be either mean/median or remove the option for sum.

yes, i think it's a larger issue for discussion so it's fine to leave it for now and do a follow-up effort later. i would want to revisit why "sum" was used in the first place before nixing it.

gibsongreen added 2 commits February 26, 2025 22:17

implement bkg_spec with avg and median parameters

4f75a11

add docstring

0e5cdc2