Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed of CalSpec3 and efficiency in using cube_build #1688

Open
jemorrison opened this issue Jan 26, 2018 · 10 comments
Open

Speed of CalSpec3 and efficiency in using cube_build #1688

jemorrison opened this issue Jan 26, 2018 · 10 comments

Comments

@jemorrison
Copy link
Collaborator

In the current calspec3 pipeline, mrs_imatch calls cube_build (with option Single=True). When single=true a set of IFUcubes all with the same size ( a size which encompasses all the data mapped to the sky) are created but only a single exposure is mapped in each IFUCube. The background to be subtracted is determined from this data and written to the asdf extension of each input models. The single IFU cubes are tossed away. The next step in the pipeline - outlier detection also calls cube_build with Single = True (using the exact same data as mrs_imatch used) and creating the same single IFUCubes. The outlier detection does not subtract the background information determined from mrs_imatch but it used (somehow) in flagging outliers.

So in summary current calls in calspec3:

  1. result = self.mrs_imatch (result)
    from mrs_imatch
    cbs.single = 'True'
    cube_models = cos.process(models)
    where, cube_models are the set of SINGLE IFU Cubes
    this routine updates the asdf extension of input model (contained in result)
    later in calspec3 outlier detection is called:
  2. result = self.outlier_detection(result)

then again this routine calls cube_build the same way that outlier detection did.

So I am purposing that:
mrs_imatch returns both the input models updated with background poly in asdf AND the set
of Single IFU Cubes.

something along these lines:
To make it robust we first need to set SingleIFUCube = 0 or empty or whatever.
then something like
result = self.mrs_imatch (result, SingleIFUCube)

then pass this information on to outlier rejection
result = self.outlier_detection (result, SingleIFUCube).
if SingleIFUCube = 0 because maybe mrs_imatch was skipped then the outlier detection would
call cube_build and create the single IFU cubes. It is not 0 then it uses the IFUCubes instead of
calling IFUCube with Single = True. The outlier_detection would still need to blot the data back -
but it would not need the first call to cube_build.

Does that make sense to everyone ?
I am happy to test this out - but I first wanted your comments.
@hbushouse
@mcara
@jdavies-st

@jemorrison
Copy link
Collaborator Author

@stscieisenhamer

@mcara
Copy link
Member

mcara commented Jan 26, 2018

I have said the same thing [that computing single cubes twice in sky match and outlier detection was wasteful]. However, at that time, using single input and single output in steps was preferred.

In addition, there was a theory that outlier rejection step may need different kinds of "single" cubes (combining different channels or whatever) than what mrs_imatch would need. I do not know what is the current approach.

You also should CC @stsci-hack

@stscieisenhamer
Copy link
Collaborator

stscieisenhamer commented Jan 26, 2018

Strawman proposal if the following are true:

  • When "cube build" is mentioned, is it actually cube_build that is being called
  • That the cube sets are the same between mrs_imatch and outlier_detection

If so, then the following, psuedo-code for process:

result = <input science model>
if not self.mrs_imatch.skip or not self.outlier_detection.skip:
    single_cubes = cube_build(result, single_flag=True)

self.mrs_imatch.cubes = single_cubes
result = self.mrs_imatch(result)

self.outlier_detection.cubes = single_cubes
result = self.outlier_detection(result, cubes=single_cubes)

Where both steps have a configuration cubes where, if defined, are the ModelContainer of cubes to use. If not defined, the steps themselves would do the cube creation.

This would avoid the multiple return issues (though that really isn't an issue), make it clear the cubes are common, continue consistency with the whole Step architecture, if necessary allow configuration of the initial cube_build step.

There probably must be some check whether cubes are actually necessary depending on input data to?

@jemorrison
Copy link
Collaborator Author

As I understand how mrs_imatch and outlier detection works I think this strawman proposal seems like it would work.

@mcara
Copy link
Member

mcara commented Jan 29, 2018

@stscieisenhamer @jemorrison I actually somewhat disagree with the outlined workflow (if I understood it correctly). In particular, the first part of the pseudo-code makes little sense:

result = <input science model>
if not self.mrs_imatch.skip or not self.outlier_detection:
    single_cubes = cube_build(result, single_flag=True)

Yes, it could be inserted in the pipeline. However, it would complicate things if users would like to run a single individual step such as mrs_imatch and/or outlier_detection.

Instead, I think each of the two steps should be able to run "single_sci_cube_build" on demand when they do not get single cubes as input:

class MRSIMatchStep(Step):
    def process(input, single_cubes=None):
        if single_cubes is None:
            single_cubes = build_single_sci_cubes(input)
    ...........
    return (input, single_cubes)

# Similar processing as above should be done for the outlier detection
# Then:
result, single_cubes = self.mrs_imatch(cubes, single_cubes=None)
result = self.outlier_detection(result, single_cubes=single_cubes)

That is, each of the two steps of interest can be run stand-alone without previously computing single_cubes.

@hbushouse
Copy link
Collaborator

Would code like:

  self.mrs_imatch.cubes = single_cubes
  self.outlier_detection.cubes = single_cubes

end up with a couple of copies of the same data or are they both just pointers to a single instance of single_cubes?

@stscieisenhamer
Copy link
Collaborator

intent is that it is the same data. would have to see what can be gotten away with in the config system, though there is no doubt something could be worked up.

@stscieisenhamer
Copy link
Collaborator

@mcara Agreed that the two steps themselves should build their cubes if no cubes where given. Suggested code looks good.

@stscicrawford stscicrawford modified the milestones: Build 7.2, Build 7.3 Sep 7, 2018
@stscicrawford stscicrawford modified the milestones: Build 7.3, Build 7.4 Jun 7, 2019
@stscicrawford stscicrawford modified the milestones: Build 7.4, Future Sep 23, 2019
@hbushouse
Copy link
Collaborator

There are discussions underway to completely rewrite the way outlier detection is done for IFU data, which does not involved building all the single cubes, hence if/when that is implemented, there will no longer be duplicate cubes constructed in outlier_detection and mrs_imatch and hence this issue becomes moot. So I propose closing this ticket.
@jemorrison @mcara @stscieisenhamer what say you?

@jemorrison
Copy link
Collaborator Author

jemorrison commented Apr 25, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants