Skip to content

Restructure workflow for streaming #176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 10, 2025

Conversation

SimonHeybrock
Copy link
Member

@SimonHeybrock SimonHeybrock commented Jun 2, 2025

This change is necessary to allow for accumulation of detector and monitor counts in stream processing, while avoiding excessive or unbounded use of memory.

The main change is that we normalize to monitor after reducing the "pixel" dimension. Since this normalization may require wavelength, we do so in a 2-D $(2\theta,d)$ grid, with a unique wavelength for each grid cell.

Detailed changes:

Domain Type Changes

Renamed Types:

  • DataWithScatteringCoordinatesCountsWavelength
  • DspacingDataCountsDspacing
  • NormalizedRunDataScaledCountsDspacing

New Types:

  • ReducedCountsDspacing: Added to represent reduced counts in d-spacing after partial reduction over pixel dimension
  • KeepEvents: New dataclass to control whether events are kept during data focusing or immediately histogrammed

Function Changes

Modified Functions:

  • normalize_by_monitor_histogram: Now takes ReducedCountsDspacing instead of DataWithScatteringCoordinates and returns ScaledCountsDspacing
  • normalize_by_monitor_integrated: Similar type signature changes as above
  • normalize_by_proton_charge: Similar type signature changes as above
  • apply_masks: Now takes CountsWavelength instead of NormalizedRunData

New or Significantly Modified Functions:

  • focus_data_dspacing_and_two_theta: Expanded to handle event preservation based on KeepEvents flag
  • integrate_two_theta: New function to integrate over the two-theta dimension
  • group_two_theta: New function for grouping data by two-theta bins

Workflow Changes

Event Preservation Control: Added the ability to control whether events are kept during data focusing through the KeepEvents parameter. Vanadium events are not kept by default, the others are.

@SimonHeybrock SimonHeybrock marked this pull request as ready for review June 2, 2025 06:56
@SimonHeybrock SimonHeybrock marked this pull request as draft June 2, 2025 07:03
@SimonHeybrock SimonHeybrock marked this pull request as ready for review June 2, 2025 08:15
@SimonHeybrock SimonHeybrock requested a review from jl-wynen June 2, 2025 08:15
RunType,
SampleRun,
ScaledCountsDspacing,
UncertaintyBroadcastMode,
VanadiumRun,
WavelengthMonitor,
)


def normalize_by_monitor_histogram(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this now be called scale_by_monitor_histogram? And instead of RunNormalization, should it be Scaling?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. But maybe we should wait until we have (separately from this) had a wider look at naming? I tried to not go beyond the scope of this restructure, where sensible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any actual changes in this file or did you only reformat?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The domain types have changed in some places.

det_coord = (
detector.coords[dim] if dim in detector.coords else detector.bins.coords[dim]
)
if dim not in detector.coords:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change this to always use the event coord even if there is a bin coord for binned data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

def _reconstruct_wavelength(
dspacing_bins: DspacingBins, two_theta_bins: TwoThetaBins
) -> sc.Variable:
dspacing = sc.midpoints(dspacing_bins)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try keeping this as bin-edges.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +233 to +241
@dataclass(frozen=True)
class KeepEvents(Generic[RunType]):
"""
Flag indicating whether the workflow should keep all events when focussing data.

If False, data will be histogrammed when focussing.
"""

value: bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of using a dataclass here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool cannot be subclassed, so I got complaints when trying to use sciline.Scope.

SimonHeybrock and others added 3 commits June 4, 2025 12:23
Co-authored-by: Jan-Lukas Wynen <jan-lukas.wynen@ess.eu>
Co-authored-by: Jan-Lukas Wynen <jan-lukas.wynen@ess.eu>
Co-authored-by: Jan-Lukas Wynen <jan-lukas.wynen@ess.eu>
@SimonHeybrock SimonHeybrock requested a review from jl-wynen June 6, 2025 09:34
edge_dims = [
dim
for dim, size in counts.sizes.items()
if size + 1 == det_coord.sizes[dim]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that det_coord has the same dims as counts. Is this guaranteed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

counts was made from detector so det_coord should have either the same length or be one longer, which is what we check here, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the number of dimensions. This code iterates over the sizes of counts and assumes that each dim exists in det_coord.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice yes, since each bin has a different wavelength, i.e., this is not a 1-D coord.

@SimonHeybrock SimonHeybrock merged commit d18fae1 into main Jun 10, 2025
4 checks passed
@SimonHeybrock SimonHeybrock deleted the restructure-workflow-for-streaming branch June 10, 2025 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants