Skip to content

Commit ef57285

Browse files
Merge pull request #52 from ISISComputingGroup/3_uncertainties_in_plots_and_fits
Use uncertainties in plots and fits
2 parents 8734c86 + ed4a9e0 commit ef57285

File tree

11 files changed

+337
-45
lines changed

11 files changed

+337
-45
lines changed
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Variance addition to counts data
2+
3+
## Status
4+
5+
Current
6+
7+
## Context
8+
9+
For counts data, the uncertainty on counts is typically defined by poisson counting statistics, i.e. the standard deviation on `N` counts is `sqrt(N)`.
10+
11+
This can be problematic in cases where zero counts have been collected, as the standard deviation will then be zero, which will subsequently lead to "infinite" point weightings in downstream fitting routines for example.
12+
13+
A number of possible approaches were considered:
14+
15+
| Option | Description |
16+
| --- | --- |
17+
| A | Reject data with zero counts, i.e. explicitly throw an exception if any data with zero counts is seen as part of a scan. |
18+
| B | Use a standard deviation of `NaN` for points with zero counts. |
19+
| C | Define the standard deviation of `N` counts as `1` if counts are zero, otherwise `sqrt(N)`. This is one of the approaches available in mantid for example. |
20+
| D | Define the standard deviation of `N` counts as `sqrt(N+0.5)` unconditionally - on the basis that "half a count" is smaller than the smallest possible actual measurement which can be taken. |
21+
| E | No special handling, calculate std. dev. as `sqrt(N)`. |
22+
23+
For clarity, the following table shows the value and associated uncertainty for each option:
24+
25+
| Counts | Std. Dev. (A) | Std. Dev. (B) | Std. Dev. (C) | Std. Dev. (D) | Std. Dev. (E) |
26+
| ------- | ------ | ------- | ------- | ------- | --- |
27+
| 0 | raise exception | NaN | 1 | 0.707 | 0 |
28+
| 1 | 1 | 1 | 1 | 1.224745 | 1 |
29+
| 2 | 1.414214 | 1.414214 | 1.414214 | 1.581139 | 1.414214 |
30+
| 3 | 1.732051 | 1.732051 | 1.732051 | 1.870829 | 1.732051 |
31+
| 4 | 2 | 2 | 2 | 2.12132 | 2 |
32+
| 5 | 2.236068 | 2.236068 | 2.236068 | 2.345208 | 2.236068 |
33+
| 10 | 3.162278 | 3.162278 | 3.162278 | 3.24037 | 3.162278 |
34+
| 50 | 7.071068 | 7.071068 | 7.071068 | 7.106335 | 7.071068 |
35+
| 100 | 10 | 10 | 10 | 10.02497 | 10 |
36+
| 500 | 22.36068 | 22.36068 | 22.36068 | 22.37186 | 22.36068 |
37+
| 1000 | 31.62278 | 31.62278 | 31.62278 | 31.63068 | 31.62278 |
38+
| 5000 | 70.71068 | 70.71068 | 70.71068 | 70.71421 | 70.71068 |
39+
| 10000 | 100 | 100 | 100 | 100.0025 | 100 |
40+
41+
## Present
42+
43+
These approaches were discussed in a regular project update meeting including
44+
- TW & FA (Experiment controls)
45+
- CK (Reflectometry)
46+
- JL (Muons)
47+
- RD (SANS)
48+
49+
## Decision
50+
51+
The consensus was to go with Option D.
52+
53+
## Justification
54+
55+
- Option A will cause real-life scans to crash in low counts regions.
56+
- Option B involves `NaN`s, which have many surprising floating-point characteristics and are highly likely to be a source of future bugs.
57+
- Option D was preferred to option C by scientists present.
58+
- Option E causes surprising results and/or crashes downstream, for example fitting may consider points with zero uncertainty to have "infinite" weight, therefore effectively disregarding all other data.

doc/callbacks/plotting.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,12 @@ ax = plt.gca()
3838
# Set the y-scale to logarithmic
3939
ax.set_yscale("log")
4040
# Use the above axes in a LivePlot callback
41-
plot_callback = LivePlot(y="y_variable", x="x_variable", ax=ax)
41+
plot_callback = LivePlot(y="y_variable", x="x_variable", ax=ax, yerr="yerr_variable")
42+
# yerr is the uncertanties of each y value, producing error bars
4243
```
4344

45+
By providing a signal name to the `yerr` argument you can pass uncertainties to LivePlot, by not providing anything for this argument means that no errorbars will be drawn. Errorbars are drawn after each point collected, displaying their standard deviation- uncertainty data is collected from Bluesky event documents and errorbars are updated after every new point added.
46+
4447
The `plot_callback` object can then be subscribed to the run engine, using either:
4548
- An explicit callback when calling the run engine: `RE(some_plan(), plot_callback)`
4649
- Be subscribed in a plan using `@subs_decorator` from bluesky **(recommended)**

doc/fitting/fitting.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -24,14 +24,17 @@ plt.figure()
2424
ax = plt.gca()
2525
# ax is shared by fit_callback and plot_callback
2626

27-
plot_callback = LivePlot(y="y_variable", x="x_variable", ax=ax)
28-
fit_callback = LiveFit(Gaussian.fit(), y="y_variable", x="x_variable", update_every=0.5)
27+
plot_callback = LivePlot(y="y_signal", x="x_signal", ax=ax, yerr="yerr_signal")
28+
fit_callback = LiveFit(Gaussian.fit(), y="y_signal", x="x_signal", yerr="yerr_signal", update_every=0.5)
29+
# Using the yerr parameter allows you to use error bars.
2930
# update_every = in seconds, how often to recompute the fit. If `None`, do not compute until the end. Default is 1.
3031
fit_plot_callback = LiveFitPlot(fit_callback, ax=ax, color="r")
3132
```
3233

3334
**Note:** that the `LiveFit` callback doesn't directly do the plotting, it will return function parameters of the model its trying to fit to; a `LiveFit` object must be passed to `LiveFitPlot` which can then be subscribed to the `RunEngine`. See the [Bluesky Documentation](https://blueskyproject.io/bluesky/main/callbacks.html#livefitplot) for information on the various arguments that can be passed to the `LiveFitPlot` class.
3435

36+
Using the `yerr` argument allows you to pass uncertainties via a signal to LiveFit, so that the "weight" of each point influences the fit produced. By not providing a signal name you choose not to use uncertainties/weighting in the fitting calculation. Each weight is computed as `1/(standard deviation at point)` and is taken into account to determine how much a point affects the overall fit of the data. Same as the rest of `LiveFit`, the fit will be updated after every new point collected now taking into account the weights of each point. Uncertainty data is collected from Bluesky event documents after each new point.
37+
3538
The `plot_callback` and `fit_plot_callback` objects can then be subscribed to the `RunEngine`, using the same methods as described in [`LivePlot`](../callbacks/plotting.md). See the following example using `@subs_decorator`:
3639

3740
```py
@@ -79,7 +82,7 @@ from bluesky.callbacks import LiveFitPlot
7982
from ibex_bluesky_core.callbacks.fitting.fitting_utils import [FIT]
8083

8184
# Pass [FIT].fit() to the first parameter of LiveFit
82-
lf = LiveFit([FIT].fit(), y="y_variable", x="x_variable", update_every=0.5)
85+
lf = LiveFit([FIT].fit(), y="y_signal", x="x_signal", update_every=0.5)
8386

8487
# Then subscribe to LiveFitPlot(lf, ...)
8588
```
@@ -89,7 +92,7 @@ The `[FIT].fit()` function will pass the `FitMethod` object straight to the `Liv
8992
**Note:** that for the fits in the above table that require parameters, you will need to pass value(s) to their `.fit` method. For example Polynomial fitting:
9093

9194
```py
92-
lf = LiveFit(Polynomial.fit(3), y="y_variable", x="x_variable", update_every=0.5)
95+
lf = LiveFit(Polynomial.fit(3), y="y_signal", x="x_signal", update_every=0.5)
9396
# For a polynomial of degree 3
9497
```
9598

@@ -138,7 +141,7 @@ def guess(x: npt.NDArray[np.float64], y: npt.NDArray[np.float64]) -> dict[str, l
138141
fit_method = FitMethod(model, guess)
139142
#Pass the model and guess function to FitMethod
140143

141-
lf = LiveFit(fit_method, y="y_variable", x="x_variable", update_every=0.5)
144+
lf = LiveFit(fit_method, y="y_signal", x="x_signal", update_every=0.5)
142145

143146
# Then subscribe to LiveFitPlot(lf, ...)
144147
```
@@ -163,7 +166,7 @@ def different_model(x: float, c1: float, c0: float) -> float:
163166
fit_method = FitMethod(different_model, Linear.guess())
164167
# Uses the user defined model and the standard Guessing. function for linear models
165168

166-
lf = LiveFit(fit_method, y="y_variable", x="x_variable", update_every=0.5)
169+
lf = LiveFit(fit_method, y="y_signal", x="x_signal", update_every=0.5)
167170

168171
# Then subscribe to LiveFitPlot(lf, ...)
169172
```
@@ -188,7 +191,7 @@ def different_guess(x: float, c1: float, c0: float) -> float:
188191
fit_method = FitMethod(Linear.model(), different_guess)
189192
# Uses the standard linear model and the user defined Guessing. function
190193

191-
lf = LiveFit(fit_method, y="y_variable", x="x_variable", update_every=0.5)
194+
lf = LiveFit(fit_method, y="y_signal", x="x_signal", update_every=0.5)
192195

193196
# Then subscribe to LiveFitPlot(lf, ...)
194197
```

manual_system_tests/dae_scan.py

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,14 @@
88
import bluesky.plans as bp
99
import matplotlib
1010
import matplotlib.pyplot as plt
11-
from bluesky.callbacks import LiveTable
11+
from bluesky.callbacks import LiveFitPlot, LiveTable
1212
from bluesky.preprocessors import subs_decorator
1313
from bluesky.utils import Msg
1414
from ophyd_async.plan_stubs import ensure_connected
1515

1616
from ibex_bluesky_core.callbacks.file_logger import HumanReadableFileCallback
17+
from ibex_bluesky_core.callbacks.fitting import LiveFit
18+
from ibex_bluesky_core.callbacks.fitting.fitting_utils import Linear
1719
from ibex_bluesky_core.callbacks.plotting import LivePlot
1820
from ibex_bluesky_core.devices import get_pv_prefix
1921
from ibex_bluesky_core.devices.block import block_rw_rbv
@@ -27,6 +29,8 @@
2729
from ibex_bluesky_core.devices.simpledae.waiters import GoodFramesWaiter
2830
from ibex_bluesky_core.run_engine import get_run_engine
2931

32+
NUM_POINTS: int = 3
33+
3034

3135
def dae_scan_plan() -> Generator[Msg, None, None]:
3236
"""Manual system test which moves a block and reads the DAE.
@@ -67,6 +71,11 @@ def dae_scan_plan() -> Generator[Msg, None, None]:
6771
controller.run_number.set_name("run number")
6872
reducer.intensity.set_name("normalized counts")
6973

74+
_, ax = plt.subplots()
75+
lf = LiveFit(
76+
Linear.fit(), y=reducer.intensity.name, x=block.name, yerr=reducer.intensity_stddev.name
77+
)
78+
7079
yield from ensure_connected(block, dae, force_reconnect=True)
7180

7281
@subs_decorator(
@@ -81,7 +90,15 @@ def dae_scan_plan() -> Generator[Msg, None, None]:
8190
dae.good_frames.name,
8291
],
8392
),
84-
LivePlot(y=reducer.intensity.name, x=block.name, marker="x", linestyle="none"),
93+
LiveFitPlot(livefit=lf, ax=ax),
94+
LivePlot(
95+
y=reducer.intensity.name,
96+
x=block.name,
97+
marker="x",
98+
linestyle="none",
99+
ax=ax,
100+
yerr=reducer.intensity_stddev.name,
101+
),
85102
LiveTable(
86103
[
87104
block.name,
@@ -96,9 +113,9 @@ def dae_scan_plan() -> Generator[Msg, None, None]:
96113
]
97114
)
98115
def _inner() -> Generator[Msg, None, None]:
99-
num_points = 3
100-
yield from bps.mv(dae.number_of_periods, num_points)
101-
yield from bp.scan([dae], block, 0, 10, num=num_points)
116+
yield from bps.mv(dae.number_of_periods, NUM_POINTS) # type: ignore
117+
# Pyright does not understand as bluesky isn't typed yet
118+
yield from bp.scan([dae], block, 0, 10, num=NUM_POINTS)
102119

103120
yield from _inner()
104121

src/ibex_bluesky_core/callbacks/fitting/__init__.py

Lines changed: 54 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
"""For IBEX Bluesky scan fitting."""
22

33
import logging
4+
import warnings
45
from typing import Callable
56

67
import lmfit
78
import numpy as np
89
import numpy.typing as npt
910
from bluesky.callbacks import LiveFit as _DefaultLiveFit
1011
from bluesky.callbacks.core import make_class_safe
12+
from event_model.documents.event import Event
1113

1214
logger = logging.getLogger(__name__)
1315

@@ -49,31 +51,49 @@ class LiveFit(_DefaultLiveFit):
4951
"""Live fit, customized for IBEX."""
5052

5153
def __init__(
52-
self,
53-
method: FitMethod,
54-
y: str,
55-
x: str,
56-
*,
57-
update_every: int = 1,
54+
self, method: FitMethod, y: str, x: str, *, update_every: int = 1, yerr: str | None = None
5855
) -> None:
5956
"""Call Bluesky LiveFit with assumption that there is only one independant variable.
6057
6158
Args:
6259
method (FitMethod): The FitMethod (Model & Guess) to use when fitting.
6360
y (str): The name of the dependant variable.
6461
x (str): The name of the independant variable.
65-
update_every (int): How often to update the fit. (seconds)
62+
update_every (int, optional): How often to update the fit. (seconds)
63+
yerr (str or None, optional): Name of field in the Event document
64+
that provides standard deviation for each Y value. None meaning
65+
do not use uncertainties in fit.
6666
6767
"""
6868
self.method = method
69+
self.yerr = yerr
70+
self.weight_data = []
6971

7072
super().__init__(
71-
model=method.model,
72-
y=y,
73-
independent_vars={"x": x},
74-
update_every=update_every,
73+
model=method.model, y=y, independent_vars={"x": x}, update_every=update_every
7574
)
7675

76+
def event(self, doc: Event) -> None:
77+
"""When an event is received, update caches."""
78+
weight = None
79+
if self.yerr is not None:
80+
try:
81+
weight = 1 / doc["data"][self.yerr]
82+
except ZeroDivisionError:
83+
warnings.warn(
84+
"standard deviation for y is 0, therefore applying weight of 0 on fit",
85+
stacklevel=1,
86+
)
87+
weight = 0.0
88+
89+
self.update_weight(weight)
90+
super().event(doc)
91+
92+
def update_weight(self, weight: float | None = 0.0) -> None:
93+
"""Update uncertainties cache."""
94+
if self.yerr is not None:
95+
self.weight_data.append(weight)
96+
7797
def update_fit(self) -> None:
7898
"""Use the provided guess function with the most recent x and y values after every update.
7999
@@ -84,12 +104,26 @@ def update_fit(self) -> None:
84104
None
85105
86106
"""
87-
logger.debug("updating guess for %s ", self.method)
88-
self.init_guess = self.method.guess(
89-
np.array(next(iter(self.independent_vars_data.values()))),
90-
np.array(self.ydata),
91-
# Calls the guess function on the set of data already collected in the run
92-
)
93-
logger.info("new guess for %s: %s", self.method, self.init_guess)
94-
95-
super().update_fit()
107+
n = len(self.model.param_names)
108+
if len(self.ydata) < n:
109+
warnings.warn(
110+
f"LiveFitPlot cannot update fit until there are at least {n} data points",
111+
stacklevel=1,
112+
)
113+
else:
114+
logger.debug("updating guess for %s ", self.method)
115+
self.init_guess = self.method.guess(
116+
np.array(next(iter(self.independent_vars_data.values()))),
117+
np.array(self.ydata),
118+
# Calls the guess function on the set of data already collected in the run
119+
)
120+
121+
logger.info("new guess for %s: %s", self.method, self.init_guess)
122+
123+
kwargs = {}
124+
kwargs.update(self.independent_vars_data)
125+
kwargs.update(self.init_guess)
126+
self.result = self.model.fit(
127+
self.ydata, weights=None if self.yerr is None else self.weight_data, **kwargs
128+
)
129+
self.__stale = False

src/ibex_bluesky_core/callbacks/fitting/fitting_utils.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,6 @@ def guess(
190190
) -> Callable[[npt.NDArray[np.float64], npt.NDArray[np.float64]], dict[str, lmfit.Parameter]]:
191191
"""Linear Guessing."""
192192
return Polynomial.guess(1)
193-
# Uses polynomial guessing with a degree of 1
194193

195194

196195
class Polynomial(Fit):

0 commit comments

Comments
 (0)