-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single_crystal with order=1 produce different data when split used #1725
Comments
@mads-bertelsen thanks for noticing and reporting! For the sake of reproducibility, could you include your instrument files and settings used to generate the outputs? ( Also, once you get to dig deeper in the issue, perhaps think of ways to enrich e.g. Unittest_SPLIT_sample? - That instrument currently seems to behave OK in a similar setting, but the explanation of that is probably hidden in the fine print... ) |
The difference in total scattered intensity is only 0.4%, so the unittest doesn't see it as it's a PSD_monitor_4PI. Can add an extra monitor or two that look at some subset of the data.
|
The optimization assumes that the data written to the tau_list structure is untouched between the first ray calculates possible reflections and the next ray needs those same results. This is however not true as the first ray needs to calculate cross sections as it leaves the crystal, and that is the data that's left in the tau_list structure, which is then reused by the next ray (and this data doesn't apply as its for a different wave vector). One possible fix is to allocate the structure that saves this data twice, one reserved for the first scattering, then a different for subsequent scattering. Then the pointer is switched depending on the event_counter. In that way the optimization actually extends to any order and even when disabling the order parameter (though of course the amount of time saved diminishes with higher orders.) I have tested this solution and it fixes the issue. Of course it doubles memory used for the buffer. This means that the desired speed up from this optimization was never actually achieved, as each ray from the split still must calculate its cross section when leaving the crystal, so this optimization is at best a factor of 2 in performance, and not a factor of SPLIT count. We could make a mode that disregards secondary extinction to avoid that each ray calls hkl_search twice. |
@mads-bertelsen good point, this makes sense. I think the better solution would in fact be that the initial tau_list structure became a component-induced This would of course result in even more massive memory usage in GPU settings, but potential allow the optimisation to function also there. On CPU only a single thread (pr. mpi-process) is running at a time - and keeping it to a |
Indeed in a CPU-only setting / memory-wise there is no difference between a |
I am getting an error when making this a uservar, maybe because a define is used to set the size of the array. Starting a branch with the code. |
Not a fully thought-through solution, but I guess something like this could potentially work - also on GPU - with relevant initialisation of numrefs and a USERVARS %{
int numrefs;
struct tau_data *MyTau_list;
#pragma acc shape(MyTau_List[0:numrefs]) init_needed(numrefs)
%} (One added complexity is of course that the component number is added to the |
Made a pull request where we can collaborate on the fix. The Unittest is now sensitive through such issues, mainly through the transmission channel. #1726 |
The Single_crystal component has a computationally heavy step when calculating the cross section for scattering and finding the possible reflections for a particular velocity vector. If the next ray has the same velocity vector, these results can be reused. By forcing the ray to only scatter once and using split, this optimization can be triggered reliably. Though I just noticed that using split in this situation actually change the output significantly. The results below are 5E8 without split and 5E6 with split 100 on our templateLaue example.
Used latest conda release and Single_crystal.comp from main branch.
Peak intensities are affected.
Artifacts appear around top and bottom of 4 PI detector.
Will work on finding the cause of the problem and hopefully get the optimization functional again.
The text was updated successfully, but these errors were encountered: