-
Notifications
You must be signed in to change notification settings - Fork 17
Data layout proposal #42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@ziotom78 I think I need a feedback to make sure I'm not breaking your pieces of code -- in addition to a general feedback on the model |
ziotom78
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dpole , sorry for the delay in reviewing this. There are a few problems in the code, I have tried to be as clear as possible in detailing what's probably causing them.
|
From your presentation at the telecon I got the impression that you expect TOD to be processed in chunks of several hours or so. Maybe it's not really a concern, but this is somewhat at odds with the way that TOD are simulated from 4pi convolved data cubes. The 4pi convolver has the following aspects:
As you see, there is a fairly large starting overhead and memory cost associated with this kind of TOD generation. It will definitely not be economical to setup and destroy the interpolator object for a few hours' worth of TOD. |
|
Sorry for not raising this point earlier ... I only realized the intended simulation strategy when I heard "we can fit several hours of full focal plane TOD and hundreds of maps onto a single compute node". |
|
Hi Martin, no problem. The important think is that we caught this point before a decision. I think that your application isn't necessarily in contrast with the current setup. Just to clarify one thing: nothing prevents a specific application from having observations with a single detector in the TOD. The constrain that this PR adds is:
Even if we decide not to go for this, we have to be aware that we are implicitly agreeing that any interaction between detectors has to pass through one (or many) MPI call. Of course if all of our application involve communications across the full-mission time (like your case) it is better to assume that a processor holds the full tod of a mission. However, I think that interactions between detectors' tod will be frequent, and therefore it makes sense to strive for the data model I propose. Accommodating cases that work over the full tod requires a bit of a stretch. In my mind tods arrays are blocks of a block-cyclic distributed matrix that can be reshuffled at runtime (even though it should be avoided, whenever possible). What do you think? |
|
It's hard for me to judge how simulations will be carried out for LiteBIRD (e.g. how many detectors will be processed simultaneously, how long the chunks of simulated TOD are, etc.). |
|
Ok, 1GB/detector is the requirement to keep in mind. I think that it is now clear why 4pi convolution prefers to have the entire tod of a detector in the node memory. I think (and please anybody feel free to object) that this is not incompatible with this PR. The proposal is not to force chunking tods in time such that you have all the detectors in a single processor. The proposal is more "if you do have more detectors in the local memory, their tod should be aligned in memory". As I said before, if all the application will be like the 4 pi convolution, it isn't worth bothering with this requirement. However, my impression is that it is not the case: only few applications require a full mission tod in memory and the distribution of detectors across nodes. My understanding is that only beam studies require 4pi convolution. Most of the other studies can assume symmetric beams. So glitches, calibration drifts, crosstalks, etc. don't require extremely long tod chunks to be in the local memory. I believe that the best way to go is to implement this data model and provide a utilities to resize the chunks along the time and detector axes. This adds some complication, but still I think it is less complicated than the alternative (i.e. requiring the time axis to be a very long, potentially a single, chunk and each line of the detector axis to be distributed), which implies that any user attempting operations that involve multiple detectors have to use an MPI interface. |
To be more specific, the rough formula is Depending on the required resolution, this can vary quite a bit, and if one doesn't need high resolution, fairly many convolution data cubes will fit onto a node. I agree that this scenario is not incompatible with the PR. |
What if we want to simulate all the systematics effects in one single simulation, i.e. combine beam asymmetries , HWP imperfections, bandpass mismatches, cosmic rays, etc.. ? My understanding is that this is something very needed for Litebird... I would opt more for a data layout which could account for this option too.. |
True, we have the task to provide the code to run end-to-end simulations, and this issue is going to be relevant in that context. However, there are other points to consider:
|
|
I've created a page about this data model in the new readthedocs website https://litebird-sim.readthedocs.io/en/multi_det_obs/data_layout.html It is meant to stay there for reference if and when this PR will be merged |
|
The main (only?) missing piece in this PR is how quantities different form the TOD are loaded into the observations. I probably need help from @ziotom78 or @paganol . for field in fields:
field_arr = np.array([some_method(field, d) for d in detectors])
setattr(self, field, field_arr)in the |
For pointings (PR#48) I add to create this kind of «field», because the PR introduces pointing information in the form of quaternions just initialize The biggest advantage of this approach over obs = Observation(…)you can discover the name of the fields by writing |
|
I see. What you are saying actually simplifies quite a bit the role of this PR: the data layout becomes a convention that users should follow and not something enforced by the I think that we should wrap this |
Yes, that's right, I plan to use the same approach for pointings in PR #48 once this one is merged.
Well, the most important piece of information in the |
The main purpose of Simulation.create_observations was to scatter observations across processes. Now this operation is done within the Observation initialization and create_observation would become just a wrapper to it.
It would be great to have a new review on the udpated code
|
All the key elements of this PR are in place. We have to address the conflicts with master, but I think that the priority is to review the API changes proposed, of the Happy to add clarifications and/or examples either here or in the doc wherever needed. |
test_mpi is now in the base test folder. The reason is that the MPI tests should behave correctly also in serial runs. Therefore, they are executed with all the other tests by pytest
| self, detector, start_time, sampling_rate_hz, nsamples, use_mjd=False | ||
| self, detectors, n_samples, | ||
| start_time, sampling_rate_hz, use_mjd=False, dtype_tod=np.float32, | ||
| n_blocks_det=1, n_blocks_time=1, comm=None, root=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add possibility to not allocate tod
litebird_sim/observations.py
Outdated
| assert len(info) == len(self.tod) | ||
| setattr(self, name, info) | ||
|
|
||
| def get_times(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check that works also if tods are not allocated
|
I think that the new layout can be merged and used. There is still some stuff to tidy up (e.g. naming conventions, extract some methods from |
This PR proposes an implementation of the data layout proposed in #35.
Summary
Observationcontains multiple detectorsObservation, if shared by the detectorsobs.todis a 2d array with shape(n_det, n_sample)Detectoris not needed anymoreMotivation
Stacking data that is likely to interact into regular data-structures typically allows easier APIs (and sometimes faster computation). E.g.
obs.tod.mean(0)computes the common mode;C.dot(obs.tod)applies the cross-talk matrixCusing threaded and highly optimized math libraries like MKL.Perspective
Prefer shared-memory parallelism over MPI parallelism: the former typically have no or minimal API overhead compared to serial code (and sometimes better performance compared to in-node MPI parallelization). In this perspective, it is better to have as much data as possible packed in a compact data structure.
Next steps (maps data)
In this perspective, operations on an
Observationshould avoid MPI parallelization and therefore TOD-to-map accumulation should be a serial operation (maps are not distributed, every MPI rank holds an entire map). This should be possible as LiteBIRD maps should take only a limited fraction of any foreseen node size