Skip to content

Conversation

@dpole
Copy link
Member

@dpole dpole commented Jun 30, 2020

This PR proposes an implementation of the data layout proposed in #35.

Summary

  • an Observation contains multiple detectors
  • information about the detectors
    • become an attribute of the Observation, if shared by the detectors
    • are stacked in an array, which become an attribute of the observation. Notably, obs.tod is a 2d array with shape (n_det, n_sample)
  • Detector is not needed anymore

Motivation

Stacking data that is likely to interact into regular data-structures typically allows easier APIs (and sometimes faster computation). E.g. obs.tod.mean(0) computes the common mode; C.dot(obs.tod) applies the cross-talk matrix C using threaded and highly optimized math libraries like MKL.

Perspective

Prefer shared-memory parallelism over MPI parallelism: the former typically have no or minimal API overhead compared to serial code (and sometimes better performance compared to in-node MPI parallelization). In this perspective, it is better to have as much data as possible packed in a compact data structure.

Next steps (maps data)

In this perspective, operations on an Observation should avoid MPI parallelization and therefore TOD-to-map accumulation should be a serial operation (maps are not distributed, every MPI rank holds an entire map). This should be possible as LiteBIRD maps should take only a limited fraction of any foreseen node size

@dpole dpole requested a review from ziotom78 June 30, 2020 15:08
@dpole
Copy link
Member Author

dpole commented Jun 30, 2020

@ziotom78 I think I need a feedback to make sure I'm not breaking your pieces of code -- in addition to a general feedback on the model

Copy link
Member

@ziotom78 ziotom78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dpole , sorry for the delay in reviewing this. There are a few problems in the code, I have tried to be as clear as possible in detailing what's probably causing them.

@mreineck
Copy link
Collaborator

From your presentation at the telecon I got the impression that you expect TOD to be processed in chunks of several hours or so. Maybe it's not really a concern, but this is somewhat at odds with the way that TOD are simulated from 4pi convolved data cubes.

The 4pi convolver has the following aspects:

  1. it inputs the sky and beam a_lm
  2. from those it generates the interpolation data cube (fairly expensive operation). The size of the data cube is 10-100 times the size of a sky map
  3. given a set of (theta, phi, psi) pointings, it computes the TOD values; this operation is quick and scales with the length of the TOD. This can be called repeatedly with different pointing arrays.

As you see, there is a fairly large starting overhead and memory cost associated with this kind of TOD generation. It will definitely not be economical to setup and destroy the interpolator object for a few hours' worth of TOD.
On the other hand, given the size of the data cubes, it will not be possible either to hold all data cubes for all detectors in the memory of a single node. So, it will not be possible to do a full focal plane simulation on a single node, no matter how finely the TOD is chunked.

@mreineck
Copy link
Collaborator

Sorry for not raising this point earlier ... I only realized the intended simulation strategy when I heard "we can fit several hours of full focal plane TOD and hundreds of maps onto a single compute node".

@dpole
Copy link
Member Author

dpole commented Jul 23, 2020

Hi Martin, no problem. The important think is that we caught this point before a decision. I think that your application isn't necessarily in contrast with the current setup.

Just to clarify one thing: nothing prevents a specific application from having observations with a single detector in the TOD. The constrain that this PR adds is:

  • if many detectors are in the same node, you can assume that they are organized in a 2d array
  • codes that involve communications between detectors can assume shared memory (off-node communications between detectors or time chunks will be rare)

Even if we decide not to go for this, we have to be aware that we are implicitly agreeing that any interaction between detectors has to pass through one (or many) MPI call.

Of course if all of our application involve communications across the full-mission time (like your case) it is better to assume that a processor holds the full tod of a mission. However, I think that interactions between detectors' tod will be frequent, and therefore it makes sense to strive for the data model I propose.

Accommodating cases that work over the full tod requires a bit of a stretch. In my mind tods arrays are blocks of a block-cyclic distributed matrix that can be reshuffled at runtime (even though it should be avoided, whenever possible).

What do you think?

@mreineck
Copy link
Collaborator

mreineck commented Aug 4, 2020

It's hard for me to judge how simulations will be carried out for LiteBIRD (e.g. how many detectors will be processed simultaneously, how long the chunks of simulated TOD are, etc.).
What I can say is the following: if we assume that the "source" of the TOD in a pipeline is the 4pi convolver, and if we ignore complicated MPI communications for a moment, then it will not be possible to process more than, say, 100 detectors on a single node, even if that node is quite big. There is no real way around this: if you want to be able to ask "give me the TOD of this detector in the time range from A to B", then memory of roughly 1GB per detector is required on the node to compute this information, independent of the time range.
It's of course always possible to generate the TOD and write them to disk in a separate run; then they can be loaded in chunks as needed. But I think this is not the desired application scenario, since the data volumes will be quite big and I/O can be a bottleneck.

@dpole
Copy link
Member Author

dpole commented Aug 4, 2020

Ok, 1GB/detector is the requirement to keep in mind. I think that it is now clear why 4pi convolution prefers to have the entire tod of a detector in the node memory.

I think (and please anybody feel free to object) that this is not incompatible with this PR. The proposal is not to force chunking tods in time such that you have all the detectors in a single processor. The proposal is more "if you do have more detectors in the local memory, their tod should be aligned in memory".

As I said before, if all the application will be like the 4 pi convolution, it isn't worth bothering with this requirement. However, my impression is that it is not the case: only few applications require a full mission tod in memory and the distribution of detectors across nodes.

My understanding is that only beam studies require 4pi convolution. Most of the other studies can assume symmetric beams. So glitches, calibration drifts, crosstalks, etc. don't require extremely long tod chunks to be in the local memory. I believe that the best way to go is to implement this data model and provide a utilities to resize the chunks along the time and detector axes.

This adds some complication, but still I think it is less complicated than the alternative (i.e. requiring the time axis to be a very long, potentially a single, chunk and each line of the detector axis to be distributed), which implies that any user attempting operations that involve multiple detectors have to use an MPI interface.

@mreineck
Copy link
Collaborator

mreineck commented Aug 4, 2020

Ok, 1GB/detector is the requirement to keep in mind.

To be more specific, the rough formula is
lmax**2 * (2*beam_mmax+1) * 60 bytes
for maximum multipole moment lmax and maximum azimuthal beam moment beam_mmax.

Depending on the required resolution, this can vary quite a bit, and if one doesn't need high resolution, fairly many convolution data cubes will fit onto a node.

I agree that this scenario is not incompatible with the PR.

@giuspugl
Copy link
Collaborator

giuspugl commented Aug 5, 2020

My understanding is that only beam studies require 4pi convolution. Most of the other studies can assume symmetric beams. So glitches, calibration drifts, crosstalks, etc. don't require extremely long tod chunks to be in the local memory. I believe that the best way to go is to implement this data model and provide a utilities to resize the chunks along the time and detector axes.

What if we want to simulate all the systematics effects in one single simulation, i.e. combine beam asymmetries , HWP imperfections, bandpass mismatches, cosmic rays, etc.. ? My understanding is that this is something very needed for Litebird... I would opt more for a data layout which could account for this option too..

@ziotom78
Copy link
Member

ziotom78 commented Aug 5, 2020

What if we want to simulate all the systematics effects in one single simulation, i.e. combine beam asymmetries , HWP imperfections, bandpass mismatches, cosmic rays, etc.. ? My understanding is that this is something very needed for Litebird... I would opt more for a data layout which could account for this option too..

True, we have the task to provide the code to run end-to-end simulations, and this issue is going to be relevant in that context. However, there are other points to consider:

  1. I don't expect that complete end-to-end simulations are going to be ran often; in fact, I expect that 99% of the users in LiteBIRD will use our framework just to run specific kinds of simulation. I remember running end-to-end simulations for Planck/LFI, and it was a nightmare to understand the results… too many effects, too complex outputs, it was always very difficult to pinpoint the cause of problems in the final data products, and very often I turned down to re-run simulations with only a few systematic effects turned on. This led to a huge amount of wasted CPU time. I think we should carefully plan how often we're going to run end-to-end simulations in LiteBIRD, and to be sure not to run them too often: ideally, only once we have ran separate simulations for each effect and we're reasonably sure that the results are what we expect, we can think of starting an end-to-end run.

  2. The two approaches to data layout have advantages and disadvantages. The one proposed in this PR is perfect for correlated effects (thermal drifts, cosmic rays, …), while the one in the master branch is better for beam convolution. I don't think that we're going to find the best data model that fits any situation, so we should aim to find the one that permits to do most of the stuff adequately. The advantage of this PR is that it can be used similarly to the approach followed in the master branch (if you use a Nx1 matrix of samples in each observation, you can easily store a full-length TOD in memory), but it enables other use cases.

@dpole
Copy link
Member Author

dpole commented Aug 10, 2020

I've created a page about this data model in the new readthedocs website https://litebird-sim.readthedocs.io/en/multi_det_obs/data_layout.html

It is meant to stay there for reference if and when this PR will be merged

@dpole
Copy link
Member Author

dpole commented Aug 10, 2020

The main (only?) missing piece in this PR is how quantities different form the TOD are loaded into the observations. I probably need help from @ziotom78 or @paganol .
I've prototyped something like

for field in fields:
    field_arr = np.array([some_method(field, d) for d in detectors])
    setattr(self, field, field_arr)

in the Observation constructor, I'm not sure about how compatible this pattern is with the rest of the framework (and some_method is of course to be written).

@ziotom78
Copy link
Member

The main (only?) missing piece in this PR is how quantities different form the TOD are loaded into the observations. I probably need help from @ziotom78 or @paganol .
I've prototyped something like

for field in fields:
    field_arr = np.array([some_method(field, d) for d in detectors])
    setattr(self, field, field_arr)

in the Observation constructor, I'm not sure about how compatible this pattern is with the rest of the framework (and some_method is of course to be written).

For pointings (PR#48) I add to create this kind of «field», because the PR introduces pointing information in the form of quaternions just initialize self.bore2ecliptic_quats to None in the constructor Observation.__init__ and then I set them once Observation.generate_pointing_information is called.

The biggest advantage of this approach over setattr is clarity: once you create an object, you immediately have the field names defined in your object, so that once you create an instance of Observation like

obs = Observation(…)

you can discover the name of the fields by writing obs. and hitting TAB in Jupyter, but you do not allocate memory for stuff you do not need. Moreover, you can search for these fields in the source code of the class, because they are always initialized explicitly (e.g., just search for self.bore2ecliptic_quats in the code to find where the quaternions are generated/used).

@dpole
Copy link
Member Author

dpole commented Aug 10, 2020

I see. What you are saying actually simplifies quite a bit the role of this PR: the data layout becomes a convention that users should follow and not something enforced by the Observation API.

I think that we should wrap this Observation pattern/API in a dedicated issue, as it is an important choice that applies to any information that we store in the Observation. I presume it is better if you do it, as you probably have a better overall view, but I'm happy to try to do it myself, if you prefer.

@ziotom78
Copy link
Member

I see. What you are saying actually simplifies quite a bit the role of this PR: the data layout becomes a convention that users should follow and not something enforced by the Observation API.

Yes, that's right, I plan to use the same approach for pointings in PR #48 once this one is merged.

I think that we should wrap this Observation pattern/API in a dedicated issue, as it is an important choice that applies to any information that we store in the Observation. I presume it is better if you do it, as you probably have a better overall view, but I'm happy to try to do it myself, if you prefer.

Well, the most important piece of information in the Observation class is the TOD; any other timestream (quaternions, pixel indices, etc.) should follow the TOD. I believe it's ok if we continue the discussion in this PR: it has received so much advertisement in the last weeks that many people in the ST are probably used to check this page.

The main purpose of Simulation.create_observations was to
scatter observations across processes.
Now this operation is done within the Observation initialization
and create_observation would become just a wrapper to it.
@dpole dpole requested a review from ziotom78 September 8, 2020 15:36
@dpole dpole dismissed ziotom78’s stale review September 8, 2020 15:38

It would be great to have a new review on the udpated code

@dpole
Copy link
Member Author

dpole commented Sep 8, 2020

All the key elements of this PR are in place. We have to address the conflicts with master, but I think that the priority is to review the API changes proposed, of the Observation class in particular.

https://litebird-sim.readthedocs.io/en/multi_det_obs/observations.html#litebird_sim.observations.Observation

Happy to add clarifications and/or examples either here or in the doc wherever needed.

test_mpi is now in the base test folder. The reason is that the
MPI tests should behave correctly also in serial runs. Therefore,
they are executed with all the other tests by pytest
self, detector, start_time, sampling_rate_hz, nsamples, use_mjd=False
self, detectors, n_samples,
start_time, sampling_rate_hz, use_mjd=False, dtype_tod=np.float32,
n_blocks_det=1, n_blocks_time=1, comm=None, root=0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add possibility to not allocate tod

assert len(info) == len(self.tod)
setattr(self, name, info)

def get_times(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that works also if tods are not allocated

@dpole
Copy link
Member Author

dpole commented Sep 22, 2020

I think that the new layout can be merged and used. There is still some stuff to tidy up (e.g. naming conventions, extract some methods from Observation and maybe enhance the doc) but we've decided to keep working on it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants